[Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Ryan R. Rosario
Hi, I have a very large dictionary that must be shared across processes and does not fit in RAM. I need access to this object to be fast. The key is an integer ID and the value is a list containing two elements, both of them numpy arrays (one has ints, the other has floats). The key is sequenti

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Nathaniel Smith
I'd try storing the data in hdf5 (probably via h5py, which is a more basic interface without all the bells-and-whistles that pytables adds), though any method you use is going to be limited by the need to do a seek before each read. Storing the data on SSD will probably help a lot if you can afford

Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-14 Thread Oscar Benjamin
On 13 January 2016 at 22:23, Chris Barker wrote: > On Mon, Jan 11, 2016 at 5:29 PM, Nathaniel Smith wrote: >> >> I agree that talking about such things on distutils-sig tends to elicit a >> certain amount of puzzled incomprehension, but I don't think it matters -- >> wheels already have everythin

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Francesc Alted
Well, maybe something like a simple class emulating a dictionary that stores a key-value on disk would be more than enough. Then you can use whatever persistence layer that you want (even HDF5, but not necessarily). As a demonstration I did a quick and dirty implementation for such a persistent k

Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-14 Thread James E.H. Turner
On 09/01/16 00:13, Nathaniel Smith wrote: Right. There's a small problem which is that the base linux system isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries that you're allowed to link to: ...", where that list is empirically chosen to include only stuff that really is inst

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Edison Gustavo Muenz
>From what I know this would be the use case that Dask seems to solve. I think this blog post can help: https://www.continuum.io/content/xray-dask-out-core-labeled-arrays-python Notice that I haven't used any of these projects myself. On Thu, Jan 14, 2016 at 11:48 AM, Francesc Alted wrote: > W

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Benjamin Root
A warning about HDF5. It is not a database format, so you have to be extremely careful if the data is getting updated while it is open for reading by anybody else. If it is strictly read-only, and no body else is updating it, then have at it! Cheers! Ben Root On Thu, Jan 14, 2016 at 9:16 AM, Edis

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Travis Oliphant
On Thu, Jan 14, 2016 at 8:16 AM, Edison Gustavo Muenz < edisongust...@gmail.com> wrote: > From what I know this would be the use case that Dask seems to solve. > > I think this blog post can help: > https://www.continuum.io/content/xray-dask-out-core-labeled-arrays-python > > Notice that I haven't

Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-14 Thread Chris Barker - NOAA Federal
>> Also, you have the problem that there is one PyPi -- so where do you put >> your nifty wheels that depend on other binary wheels? you may need to fork >> every package you want to build :-( > > Is this a real problem or a theoretical one? Do you know of some > situation where this wheel to wheel

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Feng Yu
Hi Ryan, Did you consider packing the arrays into one(two) giant array stored with mmap? That way you only need to store the start & end offsets, and there is no need to use a dictionary. It may allow you to simplify some numerical operations as well. To be more specific, start : numpy.intp end

Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-14 Thread Matthew Brett
On Thu, Jan 14, 2016 at 9:14 AM, Chris Barker - NOAA Federal wrote: >>> Also, you have the problem that there is one PyPi -- so where do you put >>> your nifty wheels that depend on other binary wheels? you may need to fork >>> every package you want to build :-( >> >> Is this a real problem or a

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Stephan Hoyer
On Thu, Jan 14, 2016 at 8:26 AM, Travis Oliphant wrote: > I don't know enough about xray to know whether it supports this kind of > general labeling to be able to build your entire data-structure as an x-ray > object. Dask could definitely be used to process your data in an easy to > describe m

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Nathaniel Smith
On Thu, Jan 14, 2016 at 2:13 PM, Stephan Hoyer wrote: > On Thu, Jan 14, 2016 at 8:26 AM, Travis Oliphant > wrote: >> >> I don't know enough about xray to know whether it supports this kind of >> general labeling to be able to build your entire data-structure as an x-ray >> object. Dask could de

Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Stephan Hoyer
On Thu, Jan 14, 2016 at 2:30 PM, Nathaniel Smith wrote: > The reason I didn't suggest dask is that I had the impression that > dask's model is better suited to bulk/streaming computations with > vectorized semantics ("do the same thing to lots of data" kinds of > problems, basically), whereas it

[Numpy-discussion] inconsistency in np.isclose

2016-01-14 Thread Andrew Nelson
Hi all, I think there is an inconsistency with np.isclose when I compare two numbers: >>> np.isclose(0, np.inf) array([False], dtype=bool) >>> np.isclose(0, 1) False The first comparison returns a bool array, the second returns a bool. Shouldn't they both return the same result? -- ___

Re: [Numpy-discussion] inconsistency in np.isclose

2016-01-14 Thread Nathaniel Smith
Yeah, that does look like a bug. On Thu, Jan 14, 2016 at 4:48 PM, Andrew Nelson wrote: > Hi all, > I think there is an inconsistency with np.isclose when I compare two > numbers: > np.isclose(0, np.inf) > array([False], dtype=bool) > np.isclose(0, 1) > False > > The first comparison ret