U On 12/13/11, numpy-discussion-requ...@scipy.org <numpy-discussion-requ...@scipy.org> wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion@scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-requ...@scipy.org > > You can reach the person managing the list at > numpy-discussion-ow...@scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Fast Reading of ASCII files (Chris Barker) > 2. Re: Apparently non-deterministic behaviour of complex array > multiplication (kneil) > 3. Re: numpy.mean problems (Eraldo Pomponi) > 4. Re: Fast Reading of ASCII files (Bruce Southey) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 13 Dec 2011 10:08:44 -0800 > From: Chris Barker <chris.bar...@noaa.gov> > Subject: Re: [Numpy-discussion] Fast Reading of ASCII files > To: denis <denis-bz...@t-online.de>, Discussion of Numerical Python > <numpy-discussion@scipy.org> > Message-ID: > <calgmxejt9y0oam1gkfsuflwabjnxflk54x-n8+f8ht5vzjc...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > NOTE: > > Let's keep this on the list. > > On Tue, Dec 13, 2011 at 9:19 AM, denis <denis-bz...@t-online.de> wrote: > >> Chris, >> unified, consistent save / load is a nice goal >> >> 1) header lines with date, pwd etc.: "where'd this come from ?" >> >> # (5, 5) svm.py bz/py/ml/svm 2011-12-13 Dec 11:56 -- automatic >> # 80.6 % correct -- user info >> 245 39 4 5 26 >> ... >> > I'm not sure I understand what you are expecting here: What would be > automatic? if itparses a datetime on the header, what would it do with it? > But anyway, this seems to me: > - very application specific -- this is for the users code to write > - not what we are talking about at this point anyway -- I think this > discussion is about a lower-level, does-the-simple-things-fast reader -- > that may or may not be able to form the basis of a higher-level fuller > featured reader. > > >> 2) read any CSVs: comma or blank-delimited, with/without column names, >> a la loadcsv() below >> > > yup -- though the column name reading would be part of a higher-level > reader as far as I'm concerned. > > >> 3) sparse or masked arrays ? >> >> sparse probably not, that seem pretty domain dependent to me -- though > hopefully one could build such a thing on top of the lower level reader. > Masked support would be good -- once we're convinced what the future of > masked arrays are in numpy. I was thinking that the masked array issue > would really be a higher-level feature -- it certainly could be if you need > to mask "special value" stype files (i.e. 9999), but we may have to build > it into the lower level reader for cases where the mask is specified by > non-numerical values -- i.e. there are some met files that use "MM" or some > other text, so you can't put it into a numerical array first. > >> >> Longterm wishes: beyond the scope of one file <-> one array >> but essential for larger projects: >> 1) dicts / dotdicts: >> Dotdict( A=anysizearray, N=scalar ... ) <-> a directory of little >> files >> is easy, better than np.savez >> (Haven't used hdf5, I believe Matlabv7 does.) >> >> 2) workflows: has anyone there used visTrails ? >> > > outside of the spec of this thread... > >> >> Anyway it seems to me (old grey cynic) that Numpy/scipy developers >> prefer to code first, spec and doc later. Too pessimistic ? >> >> > Well, I think many of us believe in a more agile style approach -- > incremental development. But really, as an open source project, it's really > about scratching an itch -- so there is usually a spec in mind for the itch > at hand. In this case, however, that has been a weakness -- clearly a > number of us hav written small solutions to our particular problem at hand, > but no we haven't arrived at a more general purpose solution yet. So a bit > of spec-ing ahead of time may be called for. > > On that: > > I"ve been thinking from teh botom-up -- imaging what I need for the simple > case, and how it might apply to more complex cases -- but maybe we should > think about this another way: > > What we're talking about here is really about core software engineering -- > optimization. It's easy to write a pure-python simple file parser, and > reasonable to write a complex one (genfromtxt) -- the issue is performance > -- we need some more C (or Cython) code to really speed it up, but none of > us wants to write the complex case code in C. So: > > genfromtxt is really nice for many of the complex cases. So perhaps > another approach is to look at genfromtxt, and see what > high performance lower-level functionality we could develop that could make > it fast -- then we are done. > > This actually mirrors exactly what we all usually recommend for python > development in general -- write it in Python, then, if it's really not fast > enough, write the bottle-neck in C. > > So where are the bottle necks in genfromtxt? Are there self-contained > portions that could be re-written in C/Cython? > > -Chris > > > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111213/2b6d09f4/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Tue, 13 Dec 2011 10:13:31 -0800 (PST) > From: kneil <magnetotellur...@gmail.com> > Subject: Re: [Numpy-discussion] Apparently non-deterministic behaviour > of complex array multiplication > To: numpy-discussion@scipy.org > Message-ID: <32969114.p...@talk.nabble.com> > Content-Type: text/plain; charset=us-ascii > > > Hi Olivier, > Sorry for the late reply - I have been on travel. > I have encountered the error in two separate cases; when I was using numpy > arrays, and when I was using numpy matrices. > In the case of a numpy array (Y), the operation is: > dot(Y,Y.conj().transpose()) > and in the case of a matrix, with X=asmatrix(Y) and then the operation is: > X*X.H > -Karl > > > Olivier Delalleau-2 wrote: >> >> I was trying to see if I could reproduce this problem, but your code fails >> with numpy 1.6.1 with: >> AttributeError: 'numpy.ndarray' object has no attribute 'H' >> Is X supposed to be a regular ndarray with dtype = 'complex128', or >> something else? >> >> -=- Olivier >> >> > > -- > View this message in context: > http://old.nabble.com/Apparently-non-deterministic-behaviour-of-complex-array-multiplication-tp32893004p32969114.html > Sent from the Numpy-discussion mailing list archive at Nabble.com. > > > > ------------------------------ > > Message: 3 > Date: Tue, 13 Dec 2011 20:04:22 +0100 > From: Eraldo Pomponi <eraldo.pomp...@gmail.com> > Subject: Re: [Numpy-discussion] numpy.mean problems > To: Discussion of Numerical Python <numpy-discussion@scipy.org> > Message-ID: > <caeacg7eaovwwqbm3xkjz8jzp3xgqks8rckkvcgi6eddrsgp...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Fred, > > I would suggest you to have a look at pandas > (http://pandas.sourceforge.net/) > . It was > really helpful for me. It seems well suited for the type of data that you > are working > with. It has nice "brodcasting" capabilities to apply numpy functions to a > set column. > http://pandas.sourceforge.net/basics.html#descriptive-statistics > http://pandas.sourceforge.net/basics.html#function-application > > Cheers, > Eraldo > > > On Sun, Dec 11, 2011 at 1:49 PM, ferreirafm > <ferreir...@lim12.fm.usp.br>wrote: > >> >> >> Aronne Merrelli wrote: >> > >> > I can recreate this error if tab is a structured ndarray - what is the >> > dtype of tab? >> > >> > If that is correct, I think you could fix this by simplifying things. >> > Since >> > tab is already an ndarray, you should not need to convert it back into a >> > python list. By converting the ndarray back to a list you are making an >> > extra level of "wrapping" as a python object, which is ultimately why >> > you >> > get that error about adding numpy.void. >> > >> > Unfortunately you cannot take directly take a mean of a struct dtype; >> > structs are generic so they could have fields with strings, or objects, >> > etc, that would be invalid for a mean calculation. However the following >> > code fragment should work pretty efficiently. It will make a 1-element >> > array of the same dtype as tab, and then populate it with the mean value >> > of >> > all elements where the length is >= 15. Note that dtype.fields.keys() >> > gives >> > you a nice way to iterate over the fields in the struct dtype: >> > >> > length_mask = tab['length'] >= 15 >> > tab_means = np.zeros(1, dtype=tab.dtype) >> > for k in tab.dtype.fields.keys(): >> > tab_means[k] = np.mean( tab[k][mask] ) >> > >> > In general this would not work if tab has a field that is not a simple >> > numeric type, such as a str, object, ... But it looks like your arrays >> are >> > all numeric from your example above. >> > >> > Hope that helps, >> > Aronne >> > >> HI Aronne, >> Thanks for your replay. Indeed, tab is a mix of different column types: >> tab.dtype: >> [('sgi', '<i8'), ('length', '<i8'), ('nident', '<i8'), ('pident', '<f8'), >> ('positive', '<i8'), ('ppos', '<f8'), ('mismatch', '<i8'), ('qstart', >> '<i8'), ('qend', '<i8'), ('sstart', '<i8'), ('send', '<i8'), ('gapopen', >> '<i8'), ('gaps', '<i8'), ('evalue', '<f8'), ('bitscore', '<f8'), ('score', >> '<f8')] >> Interestingly, I couldn't be able to import some columns of digits as >> strings like as with R dataframe objects. >> I'll try to adapt your example to my needs and let you know the results. >> Regards. >> >> -- >> View this message in context: >> http://old.nabble.com/numpy.mean-problems-tp32945124p32955052.html >> Sent from the Numpy-discussion mailing list archive at Nabble.com. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111213/487dbe82/attachment-0001.html > > ------------------------------ > > Message: 4 > Date: Tue, 13 Dec 2011 13:29:47 -0600 > From: Bruce Southey <bsout...@gmail.com> > Subject: Re: [Numpy-discussion] Fast Reading of ASCII files > To: numpy-discussion@scipy.org > Message-ID: <4ee7a7ab.8060...@gmail.com> > Content-Type: text/plain; charset="utf-8" > > On 12/13/2011 12:08 PM, Chris Barker wrote: >> NOTE: >> >> Let's keep this on the list. >> >> On Tue, Dec 13, 2011 at 9:19 AM, denis <denis-bz...@t-online.de >> <mailto:denis-bz...@t-online.de>> wrote: >> >> Chris, >> unified, consistent save / load is a nice goal >> >> 1) header lines with date, pwd etc.: "where'd this come from ?" >> >> # (5, 5) svm.py bz/py/ml/svm 2011-12-13 Dec 11:56 -- automatic >> # 80.6 % correct -- user info >> 245 39 4 5 26 >> ... >> >> I'm not sure I understand what you are expecting here: What would be >> automatic? if itparses a datetime on the header, what would it do with >> it? But anyway, this seems to me: >> - very application specific -- this is for the users code to write >> - not what we are talking about at this point anyway -- I think this >> discussion is about a lower-level, does-the-simple-things-fast reader >> -- that may or may not be able to form the basis of a higher-level >> fuller featured reader. >> >> 2) read any CSVs: comma or blank-delimited, with/without column names, >> a la loadcsv() below >> >> >> yup -- though the column name reading would be part of a higher-level >> reader as far as I'm concerned. >> >> 3) sparse or masked arrays ? >> >> sparse probably not, that seem pretty domain dependent to me -- though >> hopefully one could build such a thing on top of the lower level >> reader. Masked support would be good -- once we're convinced what the >> future of masked arrays are in numpy. I was thinking that the masked >> array issue would really be a higher-level feature -- it certainly >> could be if you need to mask "special value" stype files (i.e. 9999), >> but we may have to build it into the lower level reader for cases >> where the mask is specified by non-numerical values -- i.e. there are >> some met files that use "MM" or some other text, so you can't put it >> into a numerical array first. >> >> >> Longterm wishes: beyond the scope of one file <-> one array >> but essential for larger projects: >> 1) dicts / dotdicts: >> Dotdict( A=anysizearray, N=scalar ... ) <-> a directory of little >> files >> is easy, better than np.savez >> (Haven't used hdf5, I believe Matlabv7 does.) >> >> 2) workflows: has anyone there used visTrails ? >> >> >> outside of the spec of this thread... >> >> >> Anyway it seems to me (old grey cynic) that Numpy/scipy developers >> prefer to code first, spec and doc later. Too pessimistic ? >> >> >> Well, I think many of us believe in a more agile style approach -- >> incremental development. But really, as an open source project, it's >> really about scratching an itch -- so there is usually a spec in mind >> for the itch at hand. In this case, however, that has been a weakness >> -- clearly a number of us hav written small solutions to >> our particular problem at hand, but no we haven't arrived at a more >> general purpose solution yet. So a bit of spec-ing ahead of time may >> be called for. >> >> On that: >> >> I"ve been thinking from teh botom-up -- imaging what I need for the >> simple case, and how it might apply to more complex cases -- but maybe >> we should think about this another way: >> >> What we're talking about here is really about core software >> engineering -- optimization. It's easy to write a pure-python simple >> file parser, and reasonable to write a complex one (genfromtxt) -- the >> issue is performance -- we need some more C (or Cython) code to really >> speed it up, but none of us wants to write the complex case code in C. So: >> >> genfromtxt is really nice for many of the complex cases. So perhaps >> another approach is to look at genfromtxt, and see what >> high performance lower-level functionality we could develop that could >> make it fast -- then we are done. >> >> This actually mirrors exactly what we all usually recommend for python >> development in general -- write it in Python, then, if it's really not >> fast enough, write the bottle-neck in C. >> >> So where are the bottle necks in genfromtxt? Are there self-contained >> portions that could be re-written in C/Cython? >> >> -Chris >> >> >> > Reading data is hard and writing code that suits the diversity in the > Numerical Python community is even harder! > > Both loadtxt and genfromtxt functions (other functions are perhaps less > important) perhaps need an upgrade to incorporate the new NA object. I > think that adding the NA object will simply some of the process because > invalid data (missing or a string in a numerical format) can be set to > NA without requiring the creation of a new masked array or returning an > error. > > Here I think loadtxt is a better target than genfromtxt because, as I > understand it, it assumes the user really knows the data. Whereas > genfromtxt can ask the data for the appropriatye format. > > So I agree that new 'superfast custom CSV reader for well-behaved data' > function would be rather useful especially as an replacement for > loadtxt. By that I mean reading data using a user specified format that > essentially follows the CSV format > (http://en.wikipedia.org/wiki/Comma-separated_values) - it needs are to > allow for NA object, skipping lines and user-defined delimiters. > > Bruce > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111213/b01db77d/attachment.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 63, Issue 43 > ************************************************ >
-- Sent from my mobile device ""Reasonable people adapt themselves to the world. Unreasonable people attempt to adapt the world to themselves. All progress, therefore, depends on unreasonable people." - G.B. Shaw _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion