On Wed, 18 May 2011 15:09:31 -0700, G Jones wrote:
[clip]
> import numpy as np
> 
> x = np.memmap('mybigfile.bin',mode='r',dtype='uint8') print x.shape   #
> prints (42940071360,) in my case ndat = x.shape[0]
> for k in range(1000):
>   y = x[k*ndat/1000:(k+1)*ndat/1000].astype('float32')  #The astype
>   ensures
> that the data is read in from disk
>   del y
> 
> One would expect such a program would have a roughly constant memory
> footprint, but in fact 'top' shows that the RES memory continually
> increases. I can see that the memory usage is actually occurring because
> the OS eventually starts to swap to disk. The memory usage does not seem
> to correspond with the total size of the file.

Your OS probably likes to keep the pages touched in memory and in swap,
rather than dropping them. This happens at least on Linux.

You can check that an equivalent simple C program displays
the same behavior (use with file "data" with enough bytes):

#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{
    unsigned long size = 2000000000;
    unsigned long i;
    char *p;
    int fd;
    char sum;

    fd = open("data", O_RDONLY);
    p = (char*)mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);

    sum = 0;
    for (i = 0; i < size; ++i) {
        sum += *(p + i);
    }
    munmap(p, size);
    close(fd);

    return 0;
}

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to