Re: GSoC Proposal to extend the parallel test runner to Windows and macOS and to support Oracle parallel test running

Ahmad A. Hussein Mon, 30 Mar 2020 00:14:05 -0700

Thank you Adam for the amazing pointer. That's exactly the solution I was
looking for.


I didn't even know VACUUM INTO was a thing in SQLite3 (quite nifty). Both
sources you listed can definitely copy memory-to-memory efficiently. I had
also found two useful APIs in the standard library's documentation:
iterdump() and backup(). The latter is an attempt at recreating the online
backup API you linked.

I did face an issue in my implementation though. Memory-to-memory copying
is impossible under the current conditions.

We can't copy memory-to-memory and retain the resultant databases into our
spawned workers since spawned child processes don't share an environment
with parent processes.

There are two possible workarounds:
                                     1. To copy our in-memory database into
an on disk database using VACUUM INTO and subsequently restore it into
in-memory during worker initialization
                                     2. Pass a string query resulting from
calling Connection.iterdump() into worker initialization and call that to
restore copies of the database into in-memory for every child process.

I'm testing out both now to see if there are any major differences. There
might be a performance difference between them. I'm also checking if there
are any unexpected errors that might arise.

Regards,
Ahmad

On Sun, Mar 29, 2020 at 5:56 PM Adam Johnson <m...@adamj.eu> wrote:

> Currently what I'm figuring out is getting a SQL dump to change SQLite's
>> cloning method and implementing an Oracle cloning method. I'm searching
>> through Django's documentation and internal code to see if there is a
>> ready-made SQL dump method for SQLite and if not I'll search for it in a
>> third-party library, and if I still don't find it, I'll start thinking
>> about a ground-up implementation using the ORM.
>>
>
> SQLite has an online backup API and the "VACUUM INTO" command both of
> which can efficiently clone databases: https://sqlite.org/backup.html . I
> think they can even copy memory-to-memory with the right arguments.
>
> On Sat, 28 Mar 2020 at 06:59, Ahmad A. Hussein <ahmadahusse...@gmail.com>
> wrote:
>
>> Apologies for the late response. I've had to attend to personal matters
>> concerning the current crisis we're all facing. All is well though
>>
>> I should have posted a more detailed post from the get-go. I apologize
>> for the lack of clarity as well.
>>
>> Last week, I initially did exactly what you suggested. I called
>> django.setup() in each child process during worker initialization. This
>> fixed app registry issues but like you said it isn't enough for testing.
>> Test apps and test models were missing and caused tons of errors. Later, I
>> read through runtests.py and saw the setup method there; it was exactly
>> what I needed since it searched for setup the correct template backend,
>> searched for test apps and added them to installed apps, and called
>> django.setup(). I passed that method along with its initial arguments into
>> the runner so I could call it during worker initialization. That fixed all
>> errors related to state initialization. I do wonder though if any
>> meaningful time could be saved if we use a cache for setup instead of
>> having to call the setup for each process.
>>
>> The last glaring hole was correct database connections. I had a naming
>> mismatch with Postgres and I ended up fixing that through prepending
>> "test_" in each cloned database's name during worker initialization. In
>> case of the start method being fork, we can safely ignore that step and
>> it'll work fine.
>>
>> Currently what I'm figuring out is getting a SQL dump to change SQLite's
>> cloning method and implementing an Oracle cloning method. I'm searching
>> through Django's documentation and internal code to see if there is a
>> ready-made SQL dump method for SQLite and if not I'll search for it in a
>> third-party library, and if I still don't find it, I'll start thinking
>> about a ground-up implementation using the ORM.
>>
>> As for progress on the Oracle cloning method, I'm consulting Oracle
>> documentation right now to see if anything has changed in the last 5 years.
>> If I don't find anything interesting, I'll start toying around with Shai
>> Berger's ideas to see what works and what's performance-costly.
>>
>> Lastly, testing does seem to be an enigma to think about right now. I've
>> thought about tests for both SQLite's and Oracle's cloning method, but I
>> can't imagine anything else.
>>
>> If you have any pointers, suggestions or feedback, I'd love to hear it!
>> And thank you for your help so far.
>>
>>
>> Regards,
>> Ahmad
>>
>>
>> On Thursday, March 26, 2020 at 10:39:28 AM UTC+2, Aymeric Augustin wrote:
>>>
>>> Hello Ahmad,
>>>
>>> I believe there's interest for supporting parallel test runs on Oracle,
>>> if only to make Django's own test suite faster.
>>>
>>> I'm not very familiar with spawn vs. fork. As far as I understand, spawn
>>> starts a new process, so you'll have to redo some initialization in the
>>> child processes. In a regular application, this would mean calling
>>> `django.setup()`. In Django's own test suite, it might be different; I
>>> don't know. Try it and see what breaks, maybe?
>>>
>>> Hope this helps :-)
>>>
>>> --
>>> Aymeric.
>>>
>>>
>>>
>>> On 23 Mar 2020, at 20:22, Ahmad A. Hussein <ahmadah...@gmail.com> wrote:
>>>
>>> Django's parallel test runner works through forking processes, making it
>>> incompatible on Windows by default and incompatible in macOS due to a
>>> recent update. Windows and macOS both support spawn and have it enabled by
>>> default. Databases are cloned for each worker.
>>>
>>> To switch from fork to spawn, state setup will be handled by spawned
>>> processes instead of through inheritance via fork. Worker’s connections to
>>> databases can still be done through get_test_db_clone_settings which
>>> changes the names of databases assigned to a specific worker_id; however,
>>> SQLite's cloning method is incompatible with spawn.
>>>
>>>
>>> SQLite’s cloning method relies on it both being in-memory and fork as
>>> when we fork the main process, we retain SQLite's in-memory database in the
>>> child process. The solution is to get a SQL dump of the database and throw
>>> it into the target cloned databases. This is also the established standard
>>> in handling MySQL’s cloning process. Both Postgresql's and MySQL's cloning
>>> methods are independent of fork or spawn and won't require any modification.
>>>
>>> Oracle has not been implemented in the parallel test runner originally
>>> even on Linux and I propose to extend support to Oracle as well in my
>>> proposal. I want to confirm if there is significant support behind this as
>>> a feature or not before I commit to writing a specification, but as a
>>> summary it is definitely possible as the work collaborated on by Aymeric
>>> Augustin and Shai Berger show that cloning CAN be done through multiple
>>> ideas. The reason why it's a headache is that Oracle does not support
>>> separate databases under a single user- unlike our other supported
>>> databases, so we can't clone databases without introducing another user.
>>> Some methods may also need to be rewritten to accommodate for the Oracle
>>> backend, but that isn't an issue. I've glossed over writing out a schedule
>>> or a more detailed abstract as I I'm mainly posting here to see if there is
>>> indeed support for the Oracle proposal and to make sure I am not missing
>>> any details in regards to adapting the current parallel test runner to work
>>> through spawn. Let me know what you think.
>>>
>>> Regards,
>>> Ahmad
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-d...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/317f67c6-4b23-483f-ada5-9bdbb45d0997%40googlegroups.com
>>> <https://groups.google.com/d/msgid/django-developers/317f67c6-4b23-483f-ada5-9bdbb45d0997%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-developers+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/1e185df6-e6ea-4066-84b5-c2e0a150239a%40googlegroups.com
>> <https://groups.google.com/d/msgid/django-developers/1e185df6-e6ea-4066-84b5-c2e0a150239a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
> Adam
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAMyDDM1YU51rQZHARD0NRNKDyY0mEnt8XDPWAyyp%3DbPeT0bsqg%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAMyDDM1YU51rQZHARD0NRNKDyY0mEnt8XDPWAyyp%3DbPeT0bsqg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAJNa-uNa8rCj_d%2BLt00HceLP97QmbPM6UTRV%3DqpAmmgu8pC23w%40mail.gmail.com.

Re: GSoC Proposal to extend the parallel test runner to Windows and macOS and to support Oracle parallel test running

Reply via email to