MemoryError vs malloc error

2011-07-15 Thread Amit Dev
Hi,

I've a long running python process (running on freebsd). Sometimes when it
uses too much memory it core dumps. I would've expected it to raise
MemoryError. Normally, when would a python process raise MemoryError and
when would it fail with malloc error and cores? This is happening in pure
python code (Eg. if ' '.join(biglist)) etc.

Regards,
Amit
-- 
http://mail.python.org/mailman/listinfo/python-list


Reference Cycles with instance method

2011-03-08 Thread Amit Dev
Simple question. If I have the following code:

class A:
def __init__(self, s):
self.s = s
self.m2 = m1

def m1(self):
pass

if __name__ == '__main__':
a = A("ads")
a.m1()
a = None

The object is not garbage collected, since there appears to be a cycle
(between method m2 and A). I would expect this to behave the same as
having another method "def m2(self): self.m1()", but unfortunately its
not.
In above case m2 seems to be in a.__dict__ which is causing the cycle.
Any idea why this is so?

Regards,
Amit
-- 
http://mail.python.org/mailman/listinfo/python-list


Memory Usage of Strings

2011-03-16 Thread Amit Dev
I'm observing a strange memory usage pattern with strings. Consider
the following session. Idea is to create a list which holds some
strings so that cumulative characters in the list is 100MB.

>>> l = []
>>> for i in xrange(10):
...  l.append(str(i) * (1000/len(str(i

This uses around 100MB of memory as expected and 'del l' will clear that.


>>> for i in xrange(2):
...  l.append(str(i) * (5000/len(str(i

This is using 165MB of memory. I really don't understand where the
additional memory usage is coming from.

If I reduce the string size, it remains high till it reaches around
1000. In that case it is back to 100MB usage.

Python 2.6.4 on FreeBSD.

Regards,
Amit
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory Usage of Strings

2011-03-16 Thread Amit Dev
sum(map(len, l)) =>  8200 for 1st case and 9100 for 2nd case.
Roughly 100MB as I mentioned.

On Wed, Mar 16, 2011 at 11:21 PM, John Gordon  wrote:
> In  Amit Dev 
>  writes:
>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.
>
>> >>> l = []
>> >>> for i in xrange(10):
>> ...  l.append(str(i) * (1000/len(str(i
>
>> This uses around 100MB of memory as expected and 'del l' will clear that.
>
>> >>> for i in xrange(2):
>> ...  l.append(str(i) * (5000/len(str(i
>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.
>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.
>
> I don't know anything about the internals of python storage -- overhead,
> possible merging of like strings, etc.  but some simple character counting
> shows that these two loops do not produce the same number of characters.
>
> The first loop produces:
>
> Ten single-digit values of i which are repeated 1000 times for a total of
> 1 characters;
>
> Ninety two-digit values of i which are repeated 500 times for a total of
> 45000 characters;
>
> Nine hundred three-digit values of i which are repeated 333 times for a
> total of 299700 characters;
>
> Nine thousand four-digit values of i which are repeated 250 times for a
> total of 225 characters;
>
> Ninety thousand five-digit values of i which are repeated 200 times for
> a total of 1800 characters.
>
> All that adds up to a grand total of 20604700 characters.
>
> Or, to condense the above long-winded text in table form:
>
> range         num digits 1000/len(str(i))  total chars
> 0-9            10 1      1000                    1
> 10-99          90 2       500                    45000
> 100-999       900 3       333                   299700
> 1000-    9000 4       250                  225
> 1-9 9 5       200                 1800
>                                              
>                          grand total chars   20604700
>
> The second loop yields this table:
>
> range         num digits 5000/len(str(i))  total bytes
> 0-9            10 1      5000                    5
> 10-99          90 2      2500                   225000
> 100-999       900 3      1666                  1499400
> 1000-    9000 4      1250                 1125
> 1-1 1 5      1000                 1000
>                                              
>                          grand total chars   23024400
>
> The two loops do not produce the same numbers of characters, so I'm not
> surprised they do not consume the same amount of storage.
>
> P.S.: Please forgive me if I've made some basic math error somewhere.
>
> --
> John Gordon                   A is for Amy, who fell down the stairs
> [email protected]              B is for Basil, assaulted by bears
>                                -- Edward Gorey, "The Gashlycrumb Tinies"
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory Usage of Strings

2011-03-16 Thread Amit Dev
Thanks Dan for the detailed reply. I suspect it is related to FreeBSD
malloc/free as you suggested. Here is the output of running your
script:

[16-bsd01 ~/work]$ python strm.py --first
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
amdev  6899  3.0  6.9 111944 107560  p0  S+9:57PM   0:01.20 python
strm.py --first (python2.5)
amdev  6900  0.0  0.1  3508  1424  p0  S+9:57PM   0:00.02 sh -c ps
aux | egrep '\\<6899\\>|^USER\\>'
amdev  6902  0.0  0.1  3380  1188  p0  S+9:57PM   0:00.01 egrep
\\<6899\\>|^USER\\>

[16-bsd01 ~/work]$ python strm.py --second
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
amdev  6903  0.0 10.5 166216 163992  p0  S+9:57PM   0:00.92 python
strm.py --second (python2.5)
amdev  6904  0.0  0.1  3508  1424  p0  S+9:57PM   0:00.02 sh -c ps
aux | egrep '\\<6903\\>|^USER\\>'
amdev  6906  0.0  0.1  3508  1424  p0  R+9:57PM   0:00.00 egrep
\\<6903\\>|^USER\\> (sh)

Regards,
Amit

On Thu, Mar 17, 2011 at 3:21 AM, Dan Stromberg  wrote:
>
> On Wed, Mar 16, 2011 at 8:38 AM, Amit Dev  wrote:
>>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.
>>
>> >>> l = []
>> >>> for i in xrange(10):
>> ...  l.append(str(i) * (1000/len(str(i
>>
>> This uses around 100MB of memory as expected and 'del l' will clear that.
>>
>>
>> >>> for i in xrange(2):
>> ...  l.append(str(i) * (5000/len(str(i
>>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.
>>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.
>>
>> Python 2.6.4 on FreeBSD.
>>
>> Regards,
>> Amit
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
> On Python 2.6.6 on Ubuntu 10.10:
>
> $ cat pmu
> #!/usr/bin/python
>
> import os
> import sys
>
> list_ = []
>
> if sys.argv[1] == '--first':
>     for i in xrange(10):
>     list_.append(str(i) * (1000/len(str(i
> elif sys.argv[1] == '--second':
>     for i in xrange(2):
>     list_.append(str(i) * (5000/len(str(i
> else:
>     sys.stderr.write('%s: Illegal sys.argv[1]\n' % sys.argv[0])
>     sys.exit(1)
>
> os.system("ps aux | egrep '\<%d\>|^USER\>'" % os.getpid())
>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> $ make
> ./pmu --first
> USER   PID %CPU %MEM    VSZ   RSS TTY  STAT START   TIME COMMAND
> 1000 11063  0.0  3.4 110212 104436 pts/5   S+   14:38   0:00
> /usr/bin/python ./pmu --first
> 1000 11064  0.0  0.0   1896   512 pts/5    S+   14:38   0:00 sh -c ps
> aux | egrep '\<11063\>|^USER\>'
> 1000 11066  0.0  0.0   4012   740 pts/5    S+   14:38   0:00 egrep
> \<11063\>|^USER\>
> ./pmu --second
> USER   PID %CPU %MEM    VSZ   RSS TTY  STAT START   TIME COMMAND
> 1000 11067 13.0  3.3 107540 101536 pts/5   S+   14:38   0:00
> /usr/bin/python ./pmu --second
> 1000 11068  0.0  0.0   1896   508 pts/5    S+   14:38   0:00 sh -c ps
> aux | egrep '\<11067\>|^USER\>'
> 1000 11070  0.0  0.0   4012   740 pts/5    S+   14:38   0:00 egrep
> \<11067\>|^USER\>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> So on Python 2.6.6 + Ubuntu 10.10, the second is actually a little smaller
> than the first.
>
> Some issues you might ponder:
> 1) Does FreeBSD's malloc/free know how to free unused memory pages in the
> middle of the heap (using mmap games), or does it only sbrk() down when the
> end of the heap becomes unused, or does it never sbrk() back down at all?
> I've heard various *ix's fall into one of these 3 groups in releasing unused
> pages.
>
> 2) It mijght be just an issue of how frequently the interpreter garbage
> collects; you could try adjusting this; check out the gc module.  Note that
> it's often faster not to collect at every conceivable opportunity, but this
> tends to add up the bytes pretty quickly in some scripts - for a while,
> until the next collection.  So your memory use pattern will often end up
> looking like a bit of a sawtooth function.
>
> 3) If you need strict memory use guarantees, you might be better off with a
> language that's closer to the metal, like C - something that isn't garbage
> collected is one parameter to consider.  If you already have something in
> CPython, then Cython might help; Cython allows you to use C datastructures
> from a dialect of Python.
>
>
>
-- 
http://mail.python.org/mailman/listinfo/python-list