Hey guys,
Thanks for all the suggestions. They seem already relatively involved, but I'll
have a look at the table implementation. That seems to be the easiest of them
all.
Cheers
Wolfgang
On 2012-05-03, at 9:39 AM, Charles R Harris wrote:
>
>
> On Thu, May 3, 2012 at 3:41 AM, Nathaniel Smith <[email protected]> wrote:
> On Thu, May 3, 2012 at 4:44 AM, Charles R Harris
> <[email protected]> wrote:
> >
> >
> > On Wed, May 2, 2012 at 3:20 PM, Nathaniel Smith <[email protected]> wrote:
> >> This coordinate format is also what's used by the MATLAB Tensor
> >> Toolbox. They have a paper justifying this choice and describing some
> >> tricks for how to work with them:
> >> http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
> >> (Spoiler: you use a lot of sort operations. Conveniently, timsort
> >> appears to be perfectly adapted for their algorithmic requirements.)
> >>
> >
> > Timsort probably isn't the best choice here, it is optimized for python
> > lists of python objects where there is at least one level of indirection and
> > compares are expensive, even more expensive for compound objects. If the
> > coordinates are stored in numpy structured arrays an ordinary sort is likely
> > to be faster. Lexsort might even be faster as it could access aligned
> > integer data and not have to move lists of indexes around.
>
> To be clear, I don't mean Timsort-the-implementation, I mean
> Timsort-the-algorithm (which is now also the default sorting algorithm
> in Java). That said, it may well be optimized for expensive compares
> and something like a radix sort would be even better.
>
>
> Java uses Timsort for sorting object arrays (pointers) and dual pivot
> quicksort for sorting arrays of native types, ints and such. Timsort is very
> good for almost sorted arrays and the mixed algorithms are becoming popular,
> i.e., introsort and recent updates to the dual pivot sort that also look for
> runs. One of the reasons compares can be expensive for arrays of pointers to
> objects is that the objects can be located all over memory, which blows the
> cache.
>
> There are also a few mods to the numpy quicksort that might speed things up a
> bit more for common cases where there are a lot of repeated elements.
>
> In these sparse tensor algorithms, we often need to sort by one
> coordinate axis, and then sort by another (a "corner turn"). The
> reason Timsort seems appealing is that (1) it goes faster than O(n log
> n) when there is existing structure in the data being sorted, (2)
> because it's a stable sort, sorting on one axis then sorting on
> another will leave lots of that structure behind to be exploited
> later. So one can expect it to hit its happy case relatively often.
>
>
> Yes, that's why we have mergesort. An optimistic version making some use of
> runs might make it faster. We do have object arrays and no type specialized
> sort for them, so bringing Timsort in for those could be useful.
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion