Re: Composite fields -- (ir)regular status report #3

2013-07-26 Thread Anssi Kääriäinen


On Monday, July 22, 2013 2:48:00 AM UTC+3, Michal Petrucha wrote:
>
> Hello, 
>
> I have some awesome news today. At long last I managed to finally get 
> the refactor of ForeignKey to pass the entire test suite. It's only 
> one configuration (CPython 2.7 + SQLite), but it's a start. Due to the 
> nature of my changes, I expect that only database creation should be 
> prone to errors on other backends, otherwise I didn't really touch any 
> of the SQL generating code. 
>
> So, the plan for the immediate future is that I'm going to spend the 
> next few days fixing any remaining regressions on other database 
> backends. When this is done, I guess I'll add some more tests for the 
> field cloning mechanism -- a few more tests can never hurt. 
> Afterwards, I'll proceed with CompositeField and generally try to 
> advance through the list of items in my project's timeline. 
>
> In the meantime, I think it's time to start reviewing the changes. 
> This is the first self-contained changeset delivered by this GSoC and 
> the sooner we get it into an acceptable state suitable for merging 
> into master, the less code built on top of it will have to be adapted 
> to changes warranted by reviews. So, if anyone finds some free time 
> and will to sift through the internals of the ORM, you're welcome to 
> have a look at my GitHub repo [1]. I can also create a pull request in 
> case it makes things easier for anyone. 


I did a quick review of the patch and didn't see anything that jumped out. 
BTW when you want to merge a part of your work, please open a pull request. 
Reviewing is much easier that way.

I hope to get some time to review the work in full next week.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Streaming sitemaps

2013-07-26 Thread julianb
I don't think the queryset needs to be loaded into memory. There is an 
iterator() method available:

https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator

On Thursday, July 25, 2013 11:56:09 AM UTC+2, Aymeric Augustin wrote:
>
> Hi Julian, 
>
> Thanks for the suggestion. This is an interesting idea, however, I'd like 
> to see evidence that the performance improvement is worth the extra 
> complexity before going forwards. 
>
> > Since 1.5 we have streaming responses. What is the state of 
> contrib.sitemaps in this regard? I have some very large sitemaps and 
> experimented with making them faster a few years ago. 
>
> And what were the results of this experiment? 
>
> > If the do not yet stream, I think this would be a good idea to get 
> memory usage down. Is there anything to keep an eye on? Would it be 
> valuable to Django if this is looked into? 
>
> Large sitemaps are usually generated from a queryset, and the queryset 
> will be loaded in memory as soon as the first item is accessed. Streaming 
> the sitemap won't help at this level. 
>
> -- 
> Aymeric. 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Streaming sitemaps

2013-07-26 Thread Aymeric Augustin
On 26 juil. 2013, at 09:40, julianb  wrote:

> I don't think the queryset needs to be loaded into memory. There is an 
> iterator() method available:

I don't think .iterator() does what you expect. See 
http://thebuild.com/presentations/unbreaking-django.pdf, slide 62 and 63. If 
you're careful, model instances will be garbage-collected, but you still have 
the entire psycopg result set in memory.

To improve performance in this case you're probably better off with .values() 
or .values_list(), or more likely by splitting your sitemap.

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.




Avoid unbounded memory consumption when running `manage.py test`

2013-07-26 Thread Matt McClure
TestSuite holds references to each TestCase instance. So attributes of my 
TestCase subclasses don't get freed by the garbage collector until the last 
reference to TestSuite disappears, which isn't until the entire test run 
ends. In a large test suite, the test run exhausts memory and the OS kills 
it before the suite finishes.

The most obvious solution to me is to put tearDown methods in all my 
TestCase subclasses that delete the references to their own attributes. But 
it seems like there has to be a more general, automatic solution. I can 
start to answer my own question. But I'm interested to know if others run 
into similar problems and how you solve them.

It looks like there are at least two participants that hold TestCase 
references longer than desirable.

1.  TestSuite. Here's a minimal hack to release the reference that 
TestSuite would otherwise hold after a TestCase runs until the remainder of 
the suite had finished.

diff --git a/django/utils/unittest/suite.py b/django/utils/unittest/suite.py
index f39569b..8530200 100644
--- a/django/utils/unittest/suite.py
+++ b/django/utils/unittest/suite.py
@@ -1,5 +1,6 @@
 """TestSuite"""
 
+import gc
 import sys
 import unittest
 from django.utils.unittest import case, util
@@ -96,7 +97,11 @@ class TestSuite(BaseTestSuite):
 
 # private methods
 def _wrapped_run(self, result, debug=False):
-for test in self:
+while True:
+try:
+test = self._tests.pop(0)
+except IndexError:
+break
 if result.shouldStop:
 break
 
@@ -116,6 +121,7 @@ class TestSuite(BaseTestSuite):
 test(result)
 else:
 test.debug()
+#gc.collect()
 
 def _handleClassSetUp(self, test, result):
 previousClass = getattr(result, '_previousTestClass', None)

I believe gc.collect() in the above is unnecessary. Calling it explicitly 
accelerates the collection, but the automatic collector will free the 
TestCase from memory soon enough as long as we can manage its lifetime by 
eliminating all the references to it.

2. TestResult and subclasses may hold references to TestCase instances via 
addSuccess() and friends. For example, 
django.utils.unittest.result.TestResult.addError does:

self.errors.append((test, self._exc_info_to_string(err, test)))

In my current scenario, I'm using unittest-xml-reporting's TestResult 
subclass, and the following patch eliminates the references to TestCase 
instances that it would otherwise save.

--- __init__.py~ 2013-05-02 16:45:19.0 -0400
+++ __init__.py 2013-07-18 21:39:13.0 -0400
@@ -8,6 +8,9 @@
 import os
 import sys
 import time
+
+import psutil
+
 from unittest import TestResult, _TextTestResult, TextTestRunner
 
 try:
@@ -27,13 +30,24 @@
 self.delegate = delegate
 
 def write(self, text):
-self._captured.write(text)
+#self._captured.write(text)
 self.delegate.write(text)
 
 def __getattr__(self, attr):
 return getattr(self._captured, attr)
 
 
+def testcase_name(test_method):
+testcase = type(test_method)
+
+# Ignore module name if it is '__main__'
+module = testcase.__module__ + '.'
+if module == '__main__.':
+module = ''
+result = module + testcase.__name__
+return result
+
+
 class _TestInfo(object):
 """This class keeps useful information about the execution of a
 test method.
@@ -44,11 +58,21 @@
 
 def __init__(self, test_result, test_method, outcome=SUCCESS, 
err=None):
 self.test_result = test_result
-self.test_method = test_method
 self.outcome = outcome
 self.elapsed_time = 0
 self.err = err
 
+#self.test_method = test_method
+self.test_description = 
self.test_result.getDescription(test_method)
+self.test_exception_info = (
+'' if not self.err
+else self.test_result._exc_info_to_string(
+self.err, test_method)
+)
+
+self.test_name = testcase_name(test_method)
+self.test_id = test_method.id()
+
 def test_finished(self):
 """Save info that can only be calculated once a test has run.
 """
@@ -58,16 +82,18 @@
 def get_description(self):
 """Return a text representation of the test method.
 """
-return self.test_result.getDescription(self.test_method)
+#return self.test_result.getDescription(self.test_method)
+return self.test_description
 
 def get_error_info(self):
 """Return a text representation of an exception thrown by a test
 method.
 """
-if not self.err:
-return ''
-return self.test_result._exc_info_to_string(self.err, \
-self.test_method)
+#if not self.err:
+#return ''
+#return self.test_result._exc_info_to_string(self.err, \
+ 

Re: Avoid unbounded memory consumption when running `manage.py test`

2013-07-26 Thread Ramiro Morales
On Fri, Jul 26, 2013 at 11:03 AM, Matt McClure
 wrote:
> TestSuite holds references to each TestCase instance. So attributes of my
> TestCase subclasses don't get freed by the garbage collector until the last
> reference to TestSuite disappears, which isn't until the entire test run
> ends. In a large test suite, the test run exhausts memory and the OS kills
> it before the suite finishes.
>
> The most obvious solution to me is to put tearDown methods in all my
> TestCase subclasses that delete the references to their own attributes. But
> it seems like there has to be a more general, automatic solution. I can
> start to answer my own question. But I'm interested to know if others run
> into similar problems and how you solve them.
>
> It looks like there are at least two participants that hold TestCase
> references longer than desirable.
>
> 1.  TestSuite. Here's a minimal hack to release the reference that TestSuite
> would otherwise hold after a TestCase runs until the remainder of the suite
> had finished.
>
> diff --git a/django/utils/unittest/suite.py b/django/utils/unittest/suite.py

django.utils.unittest is a copy of the Python>= 2.7 stdlib unittest library
(aka unittest2)

Looking at the list of classes involved in your description seems to indicate
you might want to report your findings to the upstream maintainers.

That way if it's actually confirmed as an issue, the solution will benefit
much more people than people doing testing in Django.

Regards,

-- 
Ramiro Morales
@ramiromorales

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Avoid unbounded memory consumption when running `manage.py test`

2013-07-26 Thread Aymeric Augustin
On 26 juil. 2013, at 16:27, Ramiro Morales  wrote:

> django.utils.unittest is a copy of the Python>= 2.7 stdlib unittest library
> (aka unittest2)

Besides, it's deprecated in Django 1.7.

-- 
Aymeric.




-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.