MemoryError vs malloc error
Hi, I've a long running python process (running on freebsd). Sometimes when it uses too much memory it core dumps. I would've expected it to raise MemoryError. Normally, when would a python process raise MemoryError and when would it fail with malloc error and cores? This is happening in pure python code (Eg. if ' '.join(biglist)) etc. Regards, Amit -- http://mail.python.org/mailman/listinfo/python-list
Reference Cycles with instance method
Simple question. If I have the following code:
class A:
def __init__(self, s):
self.s = s
self.m2 = m1
def m1(self):
pass
if __name__ == '__main__':
a = A("ads")
a.m1()
a = None
The object is not garbage collected, since there appears to be a cycle
(between method m2 and A). I would expect this to behave the same as
having another method "def m2(self): self.m1()", but unfortunately its
not.
In above case m2 seems to be in a.__dict__ which is causing the cycle.
Any idea why this is so?
Regards,
Amit
--
http://mail.python.org/mailman/listinfo/python-list
Memory Usage of Strings
I'm observing a strange memory usage pattern with strings. Consider the following session. Idea is to create a list which holds some strings so that cumulative characters in the list is 100MB. >>> l = [] >>> for i in xrange(10): ... l.append(str(i) * (1000/len(str(i This uses around 100MB of memory as expected and 'del l' will clear that. >>> for i in xrange(2): ... l.append(str(i) * (5000/len(str(i This is using 165MB of memory. I really don't understand where the additional memory usage is coming from. If I reduce the string size, it remains high till it reaches around 1000. In that case it is back to 100MB usage. Python 2.6.4 on FreeBSD. Regards, Amit -- http://mail.python.org/mailman/listinfo/python-list
Re: Memory Usage of Strings
sum(map(len, l)) => 8200 for 1st case and 9100 for 2nd case. Roughly 100MB as I mentioned. On Wed, Mar 16, 2011 at 11:21 PM, John Gordon wrote: > In Amit Dev > writes: > >> I'm observing a strange memory usage pattern with strings. Consider >> the following session. Idea is to create a list which holds some >> strings so that cumulative characters in the list is 100MB. > >> >>> l = [] >> >>> for i in xrange(10): >> ... l.append(str(i) * (1000/len(str(i > >> This uses around 100MB of memory as expected and 'del l' will clear that. > >> >>> for i in xrange(2): >> ... l.append(str(i) * (5000/len(str(i > >> This is using 165MB of memory. I really don't understand where the >> additional memory usage is coming from. > >> If I reduce the string size, it remains high till it reaches around >> 1000. In that case it is back to 100MB usage. > > I don't know anything about the internals of python storage -- overhead, > possible merging of like strings, etc. but some simple character counting > shows that these two loops do not produce the same number of characters. > > The first loop produces: > > Ten single-digit values of i which are repeated 1000 times for a total of > 1 characters; > > Ninety two-digit values of i which are repeated 500 times for a total of > 45000 characters; > > Nine hundred three-digit values of i which are repeated 333 times for a > total of 299700 characters; > > Nine thousand four-digit values of i which are repeated 250 times for a > total of 225 characters; > > Ninety thousand five-digit values of i which are repeated 200 times for > a total of 1800 characters. > > All that adds up to a grand total of 20604700 characters. > > Or, to condense the above long-winded text in table form: > > range num digits 1000/len(str(i)) total chars > 0-9 10 1 1000 1 > 10-99 90 2 500 45000 > 100-999 900 3 333 299700 > 1000- 9000 4 250 225 > 1-9 9 5 200 1800 > > grand total chars 20604700 > > The second loop yields this table: > > range num digits 5000/len(str(i)) total bytes > 0-9 10 1 5000 5 > 10-99 90 2 2500 225000 > 100-999 900 3 1666 1499400 > 1000- 9000 4 1250 1125 > 1-1 1 5 1000 1000 > > grand total chars 23024400 > > The two loops do not produce the same numbers of characters, so I'm not > surprised they do not consume the same amount of storage. > > P.S.: Please forgive me if I've made some basic math error somewhere. > > -- > John Gordon A is for Amy, who fell down the stairs > [email protected] B is for Basil, assaulted by bears > -- Edward Gorey, "The Gashlycrumb Tinies" > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
Re: Memory Usage of Strings
Thanks Dan for the detailed reply. I suspect it is related to FreeBSD
malloc/free as you suggested. Here is the output of running your
script:
[16-bsd01 ~/work]$ python strm.py --first
USERPID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
amdev 6899 3.0 6.9 111944 107560 p0 S+9:57PM 0:01.20 python
strm.py --first (python2.5)
amdev 6900 0.0 0.1 3508 1424 p0 S+9:57PM 0:00.02 sh -c ps
aux | egrep '\\<6899\\>|^USER\\>'
amdev 6902 0.0 0.1 3380 1188 p0 S+9:57PM 0:00.01 egrep
\\<6899\\>|^USER\\>
[16-bsd01 ~/work]$ python strm.py --second
USERPID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
amdev 6903 0.0 10.5 166216 163992 p0 S+9:57PM 0:00.92 python
strm.py --second (python2.5)
amdev 6904 0.0 0.1 3508 1424 p0 S+9:57PM 0:00.02 sh -c ps
aux | egrep '\\<6903\\>|^USER\\>'
amdev 6906 0.0 0.1 3508 1424 p0 R+9:57PM 0:00.00 egrep
\\<6903\\>|^USER\\> (sh)
Regards,
Amit
On Thu, Mar 17, 2011 at 3:21 AM, Dan Stromberg wrote:
>
> On Wed, Mar 16, 2011 at 8:38 AM, Amit Dev wrote:
>>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.
>>
>> >>> l = []
>> >>> for i in xrange(10):
>> ... l.append(str(i) * (1000/len(str(i
>>
>> This uses around 100MB of memory as expected and 'del l' will clear that.
>>
>>
>> >>> for i in xrange(2):
>> ... l.append(str(i) * (5000/len(str(i
>>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.
>>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.
>>
>> Python 2.6.4 on FreeBSD.
>>
>> Regards,
>> Amit
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
> On Python 2.6.6 on Ubuntu 10.10:
>
> $ cat pmu
> #!/usr/bin/python
>
> import os
> import sys
>
> list_ = []
>
> if sys.argv[1] == '--first':
> for i in xrange(10):
> list_.append(str(i) * (1000/len(str(i
> elif sys.argv[1] == '--second':
> for i in xrange(2):
> list_.append(str(i) * (5000/len(str(i
> else:
> sys.stderr.write('%s: Illegal sys.argv[1]\n' % sys.argv[0])
> sys.exit(1)
>
> os.system("ps aux | egrep '\<%d\>|^USER\>'" % os.getpid())
>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> $ make
> ./pmu --first
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> 1000 11063 0.0 3.4 110212 104436 pts/5 S+ 14:38 0:00
> /usr/bin/python ./pmu --first
> 1000 11064 0.0 0.0 1896 512 pts/5 S+ 14:38 0:00 sh -c ps
> aux | egrep '\<11063\>|^USER\>'
> 1000 11066 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep
> \<11063\>|^USER\>
> ./pmu --second
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> 1000 11067 13.0 3.3 107540 101536 pts/5 S+ 14:38 0:00
> /usr/bin/python ./pmu --second
> 1000 11068 0.0 0.0 1896 508 pts/5 S+ 14:38 0:00 sh -c ps
> aux | egrep '\<11067\>|^USER\>'
> 1000 11070 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep
> \<11067\>|^USER\>
> dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 -
> above cmd done 2011 Wed Mar 16 02:38 PM
>
> So on Python 2.6.6 + Ubuntu 10.10, the second is actually a little smaller
> than the first.
>
> Some issues you might ponder:
> 1) Does FreeBSD's malloc/free know how to free unused memory pages in the
> middle of the heap (using mmap games), or does it only sbrk() down when the
> end of the heap becomes unused, or does it never sbrk() back down at all?
> I've heard various *ix's fall into one of these 3 groups in releasing unused
> pages.
>
> 2) It mijght be just an issue of how frequently the interpreter garbage
> collects; you could try adjusting this; check out the gc module. Note that
> it's often faster not to collect at every conceivable opportunity, but this
> tends to add up the bytes pretty quickly in some scripts - for a while,
> until the next collection. So your memory use pattern will often end up
> looking like a bit of a sawtooth function.
>
> 3) If you need strict memory use guarantees, you might be better off with a
> language that's closer to the metal, like C - something that isn't garbage
> collected is one parameter to consider. If you already have something in
> CPython, then Cython might help; Cython allows you to use C datastructures
> from a dialect of Python.
>
>
>
--
http://mail.python.org/mailman/listinfo/python-list
