Re: [Numpy-discussion] sparse array data

Dag Sverre Seljebotn Thu, 03 May 2012 01:23:54 -0700

On 05/03/2012 06:27 AM, Travis Oliphant wrote:
>
> On May 2, 2012, at 10:03 PM, Stéfan van der Walt wrote:
>
>> On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant<[email protected]>  wrote:
>>> The only new principle (which is not strictly new --- but new to NumPy's 
>>> world-view) is using one (or more) fields of a structured array as 
>>> "synthetic dimensions" which replace 1 or more of the raw table dimensions.
>>
>> Ah, thanks--that's the detail I was missing.  I wonder if the
>> contiguity requirement will hamper us here, though.  E.g., I could
>> imagine that some tree structure might be more suitable to storing and
>> organizing indices, and for large arrays we wouldn't like to make a
>> copy for each operation.  I guess we can't wait for discontiguous
>> arrays to come along, though :)
>
> Actually, it's better to keep the actual data together as much as possible, I 
> think, and simulate a tree structure with a layer on top --- i.e. an index.
>
> Different algorithms will prefer different orderings of the underlying data 
> just as today different algorithms prefer different striding patterns on the 
> standard, strided view of a dense array.


Two examples:

1) If you know the distribution of your indices in N dimensions ahead of 
time (e.g., roughly evenly particles in 3D space), you better fill that 
volume using a fractal, and use indices along that fractal, so that you 
get at least some spatial locality in all N dimensions. (This is 
standard procedure in parallelization codes)

2) For standard dense linear algebra code, it is better to store the 
data blocked than either contiguous ordering -- to the point that 
LAPACK/BLAS repacks to a blocked ordering internally on each operation.

Dag
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] sparse array data

Reply via email to