Re: Composite fields -- (ir)regular status report #3
On Monday, July 22, 2013 2:48:00 AM UTC+3, Michal Petrucha wrote: > > Hello, > > I have some awesome news today. At long last I managed to finally get > the refactor of ForeignKey to pass the entire test suite. It's only > one configuration (CPython 2.7 + SQLite), but it's a start. Due to the > nature of my changes, I expect that only database creation should be > prone to errors on other backends, otherwise I didn't really touch any > of the SQL generating code. > > So, the plan for the immediate future is that I'm going to spend the > next few days fixing any remaining regressions on other database > backends. When this is done, I guess I'll add some more tests for the > field cloning mechanism -- a few more tests can never hurt. > Afterwards, I'll proceed with CompositeField and generally try to > advance through the list of items in my project's timeline. > > In the meantime, I think it's time to start reviewing the changes. > This is the first self-contained changeset delivered by this GSoC and > the sooner we get it into an acceptable state suitable for merging > into master, the less code built on top of it will have to be adapted > to changes warranted by reviews. So, if anyone finds some free time > and will to sift through the internals of the ORM, you're welcome to > have a look at my GitHub repo [1]. I can also create a pull request in > case it makes things easier for anyone. I did a quick review of the patch and didn't see anything that jumped out. BTW when you want to merge a part of your work, please open a pull request. Reviewing is much easier that way. I hope to get some time to review the work in full next week. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.
Re: Streaming sitemaps
I don't think the queryset needs to be loaded into memory. There is an iterator() method available: https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator On Thursday, July 25, 2013 11:56:09 AM UTC+2, Aymeric Augustin wrote: > > Hi Julian, > > Thanks for the suggestion. This is an interesting idea, however, I'd like > to see evidence that the performance improvement is worth the extra > complexity before going forwards. > > > Since 1.5 we have streaming responses. What is the state of > contrib.sitemaps in this regard? I have some very large sitemaps and > experimented with making them faster a few years ago. > > And what were the results of this experiment? > > > If the do not yet stream, I think this would be a good idea to get > memory usage down. Is there anything to keep an eye on? Would it be > valuable to Django if this is looked into? > > Large sitemaps are usually generated from a queryset, and the queryset > will be loaded in memory as soon as the first item is accessed. Streaming > the sitemap won't help at this level. > > -- > Aymeric. > > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.
Re: Streaming sitemaps
On 26 juil. 2013, at 09:40, julianb wrote: > I don't think the queryset needs to be loaded into memory. There is an > iterator() method available: I don't think .iterator() does what you expect. See http://thebuild.com/presentations/unbreaking-django.pdf, slide 62 and 63. If you're careful, model instances will be garbage-collected, but you still have the entire psycopg result set in memory. To improve performance in this case you're probably better off with .values() or .values_list(), or more likely by splitting your sitemap. -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.
Avoid unbounded memory consumption when running `manage.py test`
TestSuite holds references to each TestCase instance. So attributes of my TestCase subclasses don't get freed by the garbage collector until the last reference to TestSuite disappears, which isn't until the entire test run ends. In a large test suite, the test run exhausts memory and the OS kills it before the suite finishes. The most obvious solution to me is to put tearDown methods in all my TestCase subclasses that delete the references to their own attributes. But it seems like there has to be a more general, automatic solution. I can start to answer my own question. But I'm interested to know if others run into similar problems and how you solve them. It looks like there are at least two participants that hold TestCase references longer than desirable. 1. TestSuite. Here's a minimal hack to release the reference that TestSuite would otherwise hold after a TestCase runs until the remainder of the suite had finished. diff --git a/django/utils/unittest/suite.py b/django/utils/unittest/suite.py index f39569b..8530200 100644 --- a/django/utils/unittest/suite.py +++ b/django/utils/unittest/suite.py @@ -1,5 +1,6 @@ """TestSuite""" +import gc import sys import unittest from django.utils.unittest import case, util @@ -96,7 +97,11 @@ class TestSuite(BaseTestSuite): # private methods def _wrapped_run(self, result, debug=False): -for test in self: +while True: +try: +test = self._tests.pop(0) +except IndexError: +break if result.shouldStop: break @@ -116,6 +121,7 @@ class TestSuite(BaseTestSuite): test(result) else: test.debug() +#gc.collect() def _handleClassSetUp(self, test, result): previousClass = getattr(result, '_previousTestClass', None) I believe gc.collect() in the above is unnecessary. Calling it explicitly accelerates the collection, but the automatic collector will free the TestCase from memory soon enough as long as we can manage its lifetime by eliminating all the references to it. 2. TestResult and subclasses may hold references to TestCase instances via addSuccess() and friends. For example, django.utils.unittest.result.TestResult.addError does: self.errors.append((test, self._exc_info_to_string(err, test))) In my current scenario, I'm using unittest-xml-reporting's TestResult subclass, and the following patch eliminates the references to TestCase instances that it would otherwise save. --- __init__.py~ 2013-05-02 16:45:19.0 -0400 +++ __init__.py 2013-07-18 21:39:13.0 -0400 @@ -8,6 +8,9 @@ import os import sys import time + +import psutil + from unittest import TestResult, _TextTestResult, TextTestRunner try: @@ -27,13 +30,24 @@ self.delegate = delegate def write(self, text): -self._captured.write(text) +#self._captured.write(text) self.delegate.write(text) def __getattr__(self, attr): return getattr(self._captured, attr) +def testcase_name(test_method): +testcase = type(test_method) + +# Ignore module name if it is '__main__' +module = testcase.__module__ + '.' +if module == '__main__.': +module = '' +result = module + testcase.__name__ +return result + + class _TestInfo(object): """This class keeps useful information about the execution of a test method. @@ -44,11 +58,21 @@ def __init__(self, test_result, test_method, outcome=SUCCESS, err=None): self.test_result = test_result -self.test_method = test_method self.outcome = outcome self.elapsed_time = 0 self.err = err +#self.test_method = test_method +self.test_description = self.test_result.getDescription(test_method) +self.test_exception_info = ( +'' if not self.err +else self.test_result._exc_info_to_string( +self.err, test_method) +) + +self.test_name = testcase_name(test_method) +self.test_id = test_method.id() + def test_finished(self): """Save info that can only be calculated once a test has run. """ @@ -58,16 +82,18 @@ def get_description(self): """Return a text representation of the test method. """ -return self.test_result.getDescription(self.test_method) +#return self.test_result.getDescription(self.test_method) +return self.test_description def get_error_info(self): """Return a text representation of an exception thrown by a test method. """ -if not self.err: -return '' -return self.test_result._exc_info_to_string(self.err, \ -self.test_method) +#if not self.err: +#return '' +#return self.test_result._exc_info_to_string(self.err, \ +
Re: Avoid unbounded memory consumption when running `manage.py test`
On Fri, Jul 26, 2013 at 11:03 AM, Matt McClure wrote: > TestSuite holds references to each TestCase instance. So attributes of my > TestCase subclasses don't get freed by the garbage collector until the last > reference to TestSuite disappears, which isn't until the entire test run > ends. In a large test suite, the test run exhausts memory and the OS kills > it before the suite finishes. > > The most obvious solution to me is to put tearDown methods in all my > TestCase subclasses that delete the references to their own attributes. But > it seems like there has to be a more general, automatic solution. I can > start to answer my own question. But I'm interested to know if others run > into similar problems and how you solve them. > > It looks like there are at least two participants that hold TestCase > references longer than desirable. > > 1. TestSuite. Here's a minimal hack to release the reference that TestSuite > would otherwise hold after a TestCase runs until the remainder of the suite > had finished. > > diff --git a/django/utils/unittest/suite.py b/django/utils/unittest/suite.py django.utils.unittest is a copy of the Python>= 2.7 stdlib unittest library (aka unittest2) Looking at the list of classes involved in your description seems to indicate you might want to report your findings to the upstream maintainers. That way if it's actually confirmed as an issue, the solution will benefit much more people than people doing testing in Django. Regards, -- Ramiro Morales @ramiromorales -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.
Re: Avoid unbounded memory consumption when running `manage.py test`
On 26 juil. 2013, at 16:27, Ramiro Morales wrote: > django.utils.unittest is a copy of the Python>= 2.7 stdlib unittest library > (aka unittest2) Besides, it's deprecated in Django 1.7. -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.