Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-10-02 Thread Bruce Southey
On 09/30/2009 12:44 PM, Skipper Seabold wrote: > On Wed, Sep 30, 2009 at 12:56 PM, Bruce Southey wrote: > >> On 09/30/2009 10:22 AM, Skipper Seabold wrote: >> >>> On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southeywrote: >>> >>> >>> Hi, The first case just has to handle

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-30 Thread Skipper Seabold
On Wed, Sep 30, 2009 at 12:56 PM, Bruce Southey wrote: > On 09/30/2009 10:22 AM, Skipper Seabold wrote: >> On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southey  wrote: >> >> >>> Hi, >>> The first case just has to handle a missing delimiter - actually I expect >>> that most of my cases would relate this

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-30 Thread Bruce Southey
On 09/30/2009 10:22 AM, Skipper Seabold wrote: > On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southey wrote: > > >> Hi, >> The first case just has to handle a missing delimiter - actually I expect >> that most of my cases would relate this. So here is simple Python code to >> generate arbitrary lar

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-30 Thread Skipper Seabold
On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southey wrote: > > Hi, > The first case just has to handle a missing delimiter - actually I expect > that most of my cases would relate this. So here is simple Python code to > generate arbitrary large list with the occasional missing delimiter. > > I set it

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-30 Thread Skipper Seabold
On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southey wrote: > > Hi, > The first case just has to handle a missing delimiter - actually I expect > that most of my cases would relate this. So here is simple Python code to > generate arbitrary large list with the occasional missing delimiter. > > I set it

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Christopher Barker
Pierre GM wrote: >> How does it handle the wrong number of tokens now? if an exception is >> raised somewhere, then that's the only place you'd need to anything >> extra anyway. > > It silently fails outside the loop, when the list of splitted rows is > converted into an array: if one row has a

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Bruce Southey
On 09/29/2009 01:30 PM, Pierre GM wrote: On Sep 29, 2009, at 1:57 PM, Bruce Southey wrote: On 09/29/2009 11:37 AM, Christopher Barker wrote: Pierre GM wrote: Probably more than memory is the execution time involved in printing these problem rows. The rows with prob

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Pierre GM
On Sep 29, 2009, at 3:28 PM, Christopher Barker wrote: > > well, how does one test compare to: > > read the line from the file > split the line into tokens > parse each token > > I can't imagine it's significant, but I guess you only know with > profiling. That's on the parsing part. I'd like

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Christopher Barker
Pierre GM wrote: >> Another idea: only store the indexes of the rows that have the "wrong" >> number of columns -- if that's a large number, then then user has >> bigger >> problems than memory usage! > > That was my first idea, but then it adds tests in the inside loop > (which is what I'm tr

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Pierre GM
On Sep 29, 2009, at 1:57 PM, Bruce Southey wrote: > On 09/29/2009 11:37 AM, Christopher Barker wrote: >> Pierre GM wrote: >> > Probably more than memory is the execution time involved in printing > these problem rows. The rows with problems will be printed outside the loop (with at least an a

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Bruce Southey
On 09/29/2009 11:37 AM, Christopher Barker wrote: > Pierre GM wrote: > >> I was thinking about something this week-end: we could create a second >> list when looping on the rows, where we would store the length of each >> splitted row. After the loop, we can find if these values don't match >>

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Pierre GM
On Sep 29, 2009, at 12:37 PM, Christopher Barker wrote: > Pierre GM wrote: > Another idea: only store the indexes of the rows that have the "wrong" > number of columns -- if that's a large number, then then user has > bigger > problems than memory usage! That was my first idea, but then it add

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-29 Thread Christopher Barker
Pierre GM wrote: > I was thinking about something this week-end: we could create a second > list when looping on the rows, where we would store the length of each > splitted row. After the loop, we can find if these values don't match > the expected number of columns `nbcols` and where. Then,

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-28 Thread Skipper Seabold
On Mon, Sep 28, 2009 at 1:36 PM, Pierre GM wrote: > > On Sep 28, 2009, at 12:51 PM, Skipper Seabold wrote: > >> This was probably due to the way that I timed it, honestly.  I only >> did it once.  The only differences I made for that part were in the >> first post of the thread.  Two incremented s

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-28 Thread Pierre GM
On Sep 28, 2009, at 12:51 PM, Skipper Seabold wrote: > This was probably due to the way that I timed it, honestly. I only > did it once. The only differences I made for that part were in the > first post of the thread. Two incremented scalars for line numbers > and column numbers and a try/exc

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-28 Thread Skipper Seabold
On Mon, Sep 28, 2009 at 12:41 PM, Christopher Barker wrote: > Skipper Seabold wrote: >> FWIW, I have a script that creates and savez arrays from several text >> files in total about 1.5 GB of text. >> >> without the incrementing in genfromtxt >> >> Run time: 122.043943 seconds >> >> with the incre

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-28 Thread Christopher Barker
Skipper Seabold wrote: > FWIW, I have a script that creates and savez arrays from several text > files in total about 1.5 GB of text. > > without the incrementing in genfromtxt > > Run time: 122.043943 seconds > > with the incrementing in genfromtxt > > Run time: 131.698873 seconds > > If we j

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Skipper Seabold
On Fri, Sep 25, 2009 at 3:51 PM, Christopher Barker wrote: > One more thought: > > Pierre GM wrote: That way, we don't slow things down when everything works, > > how long can it really take to increment an integer as each line is > parsed? I'd suspect no one would even notice! > A 1000

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Pierre GM
On Sep 25, 2009, at 3:56 PM, Ralf Gommers wrote: > > The examples you put in the docstring are good I think. One more > example demonstrating missing values would be useful. And +1 to a > page in the user guide for anything else. Check also what's done in tests/test_io.py, that gives an idea

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Ralf Gommers
On Fri, Sep 25, 2009 at 3:47 PM, Pierre GM wrote: > > On Sep 25, 2009, at 3:42 PM, Skipper Seabold wrote: > > > > As far as this goes, I added some examples to the docs wiki, but I > > think that genfromtxt and related would be best served by having their > > own wiki page that could maybe go her

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Pierre GM
On Sep 25, 2009, at 3:51 PM, Skipper Seabold wrote: >> >> While you're at it, can you ask for adding the possibility to process >> a dtype like (int,int,float) ? That was what I was working on >> before I >> started installing Snow Leopard... > > Sure. Should it be another keyword though `type`

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Skipper Seabold
On Fri, Sep 25, 2009 at 3:47 PM, Pierre GM wrote: > > On Sep 25, 2009, at 3:42 PM, Skipper Seabold wrote: >> >> As far as this goes, I added some examples to the docs wiki, but I >> think that genfromtxt and related would be best served by having their >> own wiki page that could maybe go here >>

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Christopher Barker
One more thought: Pierre GM wrote: >>> That way, we don't slow >>> things down when everything works, how long can it really take to increment an integer as each line is parsed? I'd suspect no one would even notice! >>if you don't keep track of where you are, wouldn't you >> need to re-parse t

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Pierre GM
On Sep 25, 2009, at 3:42 PM, Skipper Seabold wrote: > > As far as this goes, I added some examples to the docs wiki, but I > think that genfromtxt and related would be best served by having their > own wiki page that could maybe go here > > > Tho

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Skipper Seabold
On Fri, Sep 25, 2009 at 3:34 PM, Bruce Southey wrote: >>> * About the converter error: there's indeed a bug in >>> StringConverter.upgrade, I need to write some unittests to make sure I >>> get it covered. If you could get me some sample code, that'd be great. >>> >> Hmm, I'm not sure that the er

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Pierre GM
On Sep 25, 2009, at 3:12 PM, Christopher Barker wrote: > Pierre GM wrote: >> That way, we don't slow >> things down when everything works, but just add some delay when they >> don't. > > good goal, but if you don't keep track of where you are, wouldn't you > need to re-parse the whole file to fig

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Bruce Southey
On 09/25/2009 01:25 PM, Skipper Seabold wrote: > On Fri, Sep 25, 2009 at 2:17 PM, Pierre GM wrote: > >> Sorry all, I haven't been as respondent as I wished lately... >> * About the patch: I don't like the idea of adding yet some other >> tests in the main loop. I was more into letting things l

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Ralf Gommers
On Fri, Sep 25, 2009 at 3:08 PM, Christopher Barker wrote: > Ralf Gommers wrote: > > Probably the easiest for your purpose is this: > > > > def divbyzero(): > > return 1/0 > > > > try: > > a = divbyzero() > > except ZeroDivisionError as err: > > print 'problem occurred at line X' > >

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Christopher Barker
Pierre GM wrote: > That way, we don't slow > things down when everything works, but just add some delay when they > don't. good goal, but if you don't keep track of where you are, wouldn't you need to re-parse the whole file to figure it out again? Maybe a "debug" mode that the user could tu

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Christopher Barker
Ralf Gommers wrote: > Probably the easiest for your purpose is this: > > def divbyzero(): > return 1/0 > > try: > a = divbyzero() > except ZeroDivisionError as err: > print 'problem occurred at line X' > raise err I get an error with this syntax -- is thin 2.6 only? In [10]: run

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Skipper Seabold
On Fri, Sep 25, 2009 at 2:17 PM, Pierre GM wrote: > Sorry all, I haven't been as respondent as I wished lately... > * About the patch: I don't like the idea of adding yet some other > tests in the main loop. I was more into letting things like they are, > but calling some error function if some 's

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Pierre GM
Sorry all, I haven't been as respondent as I wished lately... * About the patch: I don't like the idea of adding yet some other tests in the main loop. I was more into letting things like they are, but calling some error function if some 'setting an array element with a sequence' exception is

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Skipper Seabold
On Fri, Sep 25, 2009 at 2:03 PM, Bruce Southey wrote: > On 09/25/2009 12:00 PM, Skipper Seabold wrote: >> There have been some recent attempts to improve the error reporting in >> genfromtxt, which is >> great, because hunting down the problems reading

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Skipper Seabold
On Fri, Sep 25, 2009 at 2:00 PM, Ralf Gommers wrote: > > > On Fri, Sep 25, 2009 at 1:00 PM, Skipper Seabold > wrote: >> >> There have been some recent attempts to improve the error reporting in >> genfromtxt , which is >> great, because hunting down th

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Bruce Southey
On 09/25/2009 12:00 PM, Skipper Seabold wrote: > There have been some recent attempts to improve the error reporting in > genfromtxt, which is > great, because hunting down the problems reading in big and messy > files is not fun. > > I am working on a p

Re: [Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Ralf Gommers
On Fri, Sep 25, 2009 at 1:00 PM, Skipper Seabold wrote: > There have been some recent attempts to improve the error reporting in > genfromtxt , which is > great, because hunting down the problems reading in big and messy > files is not fun. > > I am wor

[Numpy-discussion] Question about improving genfromtxt errors

2009-09-25 Thread Skipper Seabold
There have been some recent attempts to improve the error reporting in genfromtxt , which is great, because hunting down the problems reading in big and messy files is not fun. I am working on a patch that keeps up with the line number and column number