On Wed, Mar 26, 2014 at 4:28 PM, Slaunger <[email protected]> wrote:
> jseabold wrote
>> IIUC,
>>
>> [~/]
>> [1]: np.logical_and([True, False, True], [False, False, True])
>> [1]: array([False, False, True], dtype=bool)
>>
>> You can avoid looping over k since they're all the same length
>>
>> [~/]
>> [3]: np.logical_and([[True, False],[False, True],[False, True]],
>> [[False, False], [False, True], [True, True]])
>> [3]:
>> array([[False, False],
>> [False, True],
>> [False, True]], dtype=bool)
>>
>> [~/]
>> [4]: np.sum(np.logical_and([[True, False],[False, True],[False,
>> True]], [[False, False], [False, True], [True, True]]), axis=0)
>> [4]: array([0, 2])
>
> Well, yes, if you work with the pure f_k and g_k that is true, but this
> two-dimensional array will have 4*10^14 elements and will exhaust my memory.
>
> That is why I have found a more efficient method for finding only the much
> fewer changes_at elements for each k, and these arrays have unequal length,
> and has to be considered for eack k (which is tolerable as long as I avoid a
> further inner loop for each k in explicit Python).
>
> I could implement this in C and get it done sufficiently efficient. I just
> like to make a point in demonstrating this is also doable in finite time in
> Python/numpy.
>
If you want to attack it straight on and keep it conceptually simple,
this looks like it would work. Fair warning, I've never done this and
have no idea if it's actually memory and computationally efficient, so
I'd be interested to hear from experts. I just wanted to see if it
would work from disk. I wonder if a solution using PyTables would be
faster.
Provided that you can chunk your data into a memmap array, then
something you *could* do
N = 2*10**7
chunk_size = 100000
farr1 = 'scratch/arr1'
farr2 = 'scratch/arr2'
arr1 = np.memmap(farr1, dtype='uint8', mode='w+', shape=(N, 4))
arr2 = np.memmap(farr2, dtype='uint8', mode='w+', shape=(N, 4))
for i in xrange(0, N, chunk_size):
arr1[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
4)).astype(np.uint8)
arr2[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
4)).astype(np.uint8)
del arr1
del arr2
arr1 = np.memmap(farr1, mode='r', dtype='uint8', shape=(N,4))
arr2 = np.memmap(farr2, mode='r', dtype='uint8', shape=(N,4))
equal = np.logical_and(arr1[:chunk_size],
arr2[:chunk_size]).sum(0)
for i in xrange(chunk_size, N, chunk_size):
equal += np.logical_and(arr1[i:i+chunk_size],
arr2[i:i+chunk_size]).sum(0)
Skipper
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion