12, 2012 at 12:59 PM, Nathaniel Smith wrote:
> On Mon, Nov 12, 2012 at 9:08 PM, Nicolas SCHEFFER
> wrote:
>> I've pushed my code to a branch here
>> https://github.com/leschef/numpy/tree/faster_dot
>> with the commit
>> http
>>
>> http://dl.acm.org/citation.cfm?id=1356053
>>
>> (Googling for "Anatomy of High-Performance Matrix Multiplication" will
>> give you a preprint outside of paywall, but Google appears to not want
>> to give me the URL of a too long search result so
is not a surprise for me. The latter is far more
>> cache friendly that the former. Everything follows cache lines, so it is
>> faster than something that will use one element from each cache line. In
>> fact it is exactly what "proves" that the new version is correct.
ht'] - right)**2).sum())
Out[28]: 0.015331409
#
# CCl
#
While the MSE are small, I'm wondering whether:
- It's a bug: it should be exactly the same
- It's a feature: BLAS is taking shortcuts when you have A.A'. The
difference is not significant. Quick: PR that asap!
I don
I too encourage users to use scipy.linalg for speed and robustness
(hence calling this scipy.dot), but it just brings so much confusion!
When using the scipy + numpy ecosystem, you'd almost want everything
be done with scipy so that you get the best implementation in all
cases: scipy.zeros(), scipy
gt; blas version accept the same stuff, so if this isn't in the current version,
> there will be probably some adjustment later on that side. What blas do you
> use? I think ATLAS was one that was causing problem.
>
>
> When we did this in Theano, it was more complicated then this di
wrote:
> On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote:
>> Hey,
>>
>> On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote:
>> > Well, hinted by what Fabien said, I looked at the C level dot function.
>> > Quite verbose!
>> >
>> &
ht be too easy to be true.
On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER
wrote:
> I've made the necessary changes to get the proper order for the output array.
> Also, a pass of pep8 and some tests (fixmes are in failing tests)
> http://pastebin.com/M8TfbURi
>
> -n
>
I've made the necessary changes to get the proper order for the output array.
Also, a pass of pep8 and some tests (fixmes are in failing tests)
http://pastebin.com/M8TfbURi
-n
On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER
wrote:
> Thanks for all the responses folks. This is indee
Thanks for all the responses folks. This is indeed a nice problem to solve.
Few points:
I. Change the order from 'F' to 'C': I'll look into it.
II. Integration with scipy / numpy: opinions are diverging here.
Let's wait a bit to get more responses on what people think.
One thing though: I'd need t
Or just with a dot:
===
In [17]: np.tensordot(weights, matrices, (0,0))
Out[17]:
array([[ 5., 5., 5.],
[ 5., 5., 5.]])
In [18]: np.dot(matrices.T,weights).T
Out[18]:
array([[ 5., 5., 5.],
[ 5., 5., 5.]])
==
make matrices.T C_CONTIGUOUS for maximum speed.
-n
On Mon, Mar
Thalhammer
wrote:
>
> Am 29.1.2011 um 22:01 schrieb Nicolas SCHEFFER:
>
>> Hi all,
>>
>> First email to the list for me, I just want to say how grateful I am
>> to have python+numpy+ipython etc... for my day to day needs. Great
>> combination of software.
Thanks for the prompt reply!
I quickly tried that and it actually helps compared to the full
vectorized version.
Depending on the dimensions, the chunk size has to be tuned (typically
100 or so)
But I don't get any improvement w/r to the simple for loop (i can
almost match the time though).
My gue
Hi all,
First email to the list for me, I just want to say how grateful I am
to have python+numpy+ipython etc... for my day to day needs. Great
combination of software.
Anyway, I've been having this bottleneck in one my algorithms that has
been bugging me for quite a while.
The objective is to sp
14 matches
Mail list logo