your computation is symmetric so you only need to compute the upper or
lower triangle which will save both memory and time.


On Tue, Oct 8, 2013 at 10:06 AM, Ke Sun <[email protected]> wrote:

> Dear all,
>
> I have written the following function to compute the square distances of a
> large
> matrix (each sample a row). It compute row by row and print the overall
> progress.
> The progress output is important and I didn't use matrix multiplication.
>
> I give as input a 70,000x800 matrix. The output should be a 70,000x70,000
> matrix. The program runs really slow (16 hours for 1/3 progress). And it
> eats
> 36G memory (fortunately I have enough).
>
> Could you give some insights on how to modify the code to be efficient and
> to eat less memory?
>
> thanks,
> Ke Sun
>
> def dist2_large( data ):
>     import time
>     if data.ndim != 2: raise RuntimeError( "data should be a matrix" )
>     N,D = data.shape
>
>     print 'using the sample-wise implementation'
>     print '%d samples, %d dimensions' % (N,D)
>
>     start_t = time.time()
>     d2 = np.zeros( [N,N] )
>     for i in range( N ):
>         print "\r%5d/%d" % (i+1, N),
>         for j in range( N ):
>             d2[i,j] = ((data[i] - data[j])**2).sum()
>
>     total_t = time.time() - start_t
>     hours = (total_t / 3600)
>     minutes = (total_t % 3600) / 60
>     print "\nfinished in %2dh%2dm" % (hours, minutes)
>
>     return d2
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to