Re: Testing multidb with TEST_MIRROR

Anssi Kääriäinen Thu, 06 Sep 2012 09:17:00 -0700

On 3 syys, 07:40, Anssi Kääriäinen <anssi.kaariai...@thl.fi> wrote:
> I would like to make the TransactionTestCase faster. Currently when
> running Django's test suite, for every test ran you will truncate
> around 1000 tables, then create around 4000 objects (permissions +
> content types). Likely you will write to one or two tables in the
> test, and then do the truncate/recreate dance again. There is room for
> improvement. Track which tables have been modified and refresh only
> those tables.


I now have a pretty good WIP approach of tracking changes in testing.
The changes can be found from here: [https://github.com/akaariai/
django/tree/fast_tests_merged]. The approach relies on existing
signals + a new "model_changed" signal, which is used to signal
bulk_create and .update() changes, and could be used for raw SQL
changes, too.

The results are promising:
  - PostgreSQL with selenium tests, master 33m4s, patched: 9m51s
  - SQLite, no selenium tests, master: 7m35s, patched: 3m6s.

So, more than 3x speedup and more than 2x speedup. PostgreSQL without
Selenium tests is taking 7m14s, so it is slightly faster than SQLite
on master.

The current approach uses two flags in TransactionTestCase to control
state tracking: flush_all, and detect_leaks. The first instructs the
tester to do a full DB flush after the test, the latter can be used to
check if dirty state is leaked. It however tracks only object counts
currently, so updates could be missed. Making it more intelligent will
be somewhat hard as we can't compare the objects directly - the
autoincrement primary keys will not match.

The default for flush_all is False in the patched version. Defaulting
to that isn't of course acceptable if this was included in core.
Existing applications would break. IMO it should be a setting which
defaults to True.

The detect_leaks flag in TransactionTestCase should instead be a flag
to manage.py test, this way if there are mysterious failures it should
be somewhat straightforward to detect them by just giving that flag to
the test runner.

The patch also contains a cleanup to loaddata command. In master the
command is slow and complex. Now it is faster (around 50s removed by
the loaddata cleanup alone IIRC), and somewhat less complex. The patch
isn't 100% ready. Once ready I am planning to commit this separate of
the state tracking work.

Still, there is the __deepcopy__ removal patch from #16759 in there.
It removes just 15 seconds of runtime compared to loaddata + state
tracking, but I like to keep bringing this patch up... :)

I also tried to track installed fixture state across tests. We often
repeatedly load the same fixtures. This however got too complex, as
now we need to track state between tests instead of inside one test.
For my needs a better approach would be a "base_data" fixture which is
loaded for all tests, and a test that dirties the base data would be
responsible for cleaning up and reloading the fixture. One could
already implement this using the syncdb signal, as this is basically
what is done for permissions and contenttypes. The use case for me
would be mass-loading external data (medical ICD codes for example)
before testing. Reloading such codes for each test case is
horrendously slow.

I would like to get feedback on the suggested API
(settings.TRACK_TEST_STATE, --detect-state-leaks flag for manage.py
test), and if we want something like this at all in Django. This could
work as external test class, too, if we added some of the needed hooks
to Django. This would of course mean the benefits would not be there
for Django core testing. The ability to restrict flushing to subset of
models and the new signal would at least need to be in core.

 - Anssi

Raw test data:
----------------------------
Patched, Postgres, Selenium:
Ran 4850 tests in 520.710s

OK (skipped=145, expected failures=4)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

real    9m50.534s
user    4m3.283s
sys 0m19.497s

---------------------------
Master, Postgres, Selenium:
Ran 4848 tests in 1932.700s

OK (skipped=145, expected failures=4)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

real    33m3.587s
user    17m55.659s
sys 0m50.519s

-------------------------------
Patched, Postgres, no selenium:
Ran 4850 tests in 389.648s

OK (skipped=145, expected failures=4)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

real    7m14.441s
user    4m4.699s
sys 0m19.853s

-----------------------------
Patched, SQLite, no selenium:
Ran 4850 tests in 176.807s

OK (skipped=173, expected failures=4)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

real    3m6.210s
user    2m21.041s
sys 0m12.521s

----------------------------
Master, SQLite, no selenium:
Ran 4848 tests in 444.018s

OK (skipped=173, expected failures=4)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

real    7m34.566s
user    6m45.165s
sys 0m16.777s

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: Testing multidb with TEST_MIRROR

Reply via email to