Re: GSoC Proposal to extend the parallel test runner to Windows and macOS and to support Oracle parallel test running

Ahmad A. Hussein Mon, 30 Mar 2020 02:35:09 -0700

 I did stumble upon shared memory when I originally wanted to pass the
database connection itself into shared memory. Your idea of passing the
query string is MUCH more reasonable, can be done using shared memory, and
will ultimately be much more cleanly written.


i'll definitely test out if it has significant performance boost and see
how it compares to other methods.

I did consider copying the SQLite database file itself. That's essentially
what's happening with VACUUM INTO. It's a SQLite command that lets you
clone a source database whether in-memory or on disk into a database file.
I got the parallel test runner to work with these database files, creating
a database file for each worker. I think there's a performance boost to
have if we make the databases for workers in-memory, especially if the test
suite and tables grow in number; however, we also need to consider the time
it takes to make the databases in-memory.

Re-running migrations and recreating the SQLite database from scratch for
every worker might be a time-sink, but it could actually prove quicker than
other methods. I'll try it out as well and see.

Thank you for your suggestions

Regards,
Ahmad

On Mon, Mar 30, 2020 at 9:39 AM Tom Forbes <t...@tomforb.es> wrote:

> There is an interesting addition to the standard library in Python 3.8:
> multiprocessing.shared_memory (
> https://docs.python.org/3/library/multiprocessing.shared_memory.html).
> It’s only on 3.8 but it might be worth seeing if that has a performance
> improvement over passing a large string into the workers init process.
>
> Have you also considered simply re-running the migrations and creating the
> SQLite database from scratch in each worker, or simply copying the SQLite
> database file itself?
>
> On 30 Mar 2020, at 08:13, Ahmad A. Hussein <ahmadahusse...@gmail.com>
> wrote:
>
> 
> Thank you Adam for the amazing pointer. That's exactly the solution I was
> looking for.
>
> I didn't even know VACUUM INTO was a thing in SQLite3 (quite nifty). Both
> sources you listed can definitely copy memory-to-memory efficiently. I had
> also found two useful APIs in the standard library's documentation:
> iterdump() and backup(). The latter is an attempt at recreating the online
> backup API you linked.
>
> I did face an issue in my implementation though. Memory-to-memory copying
> is impossible under the current conditions.
>
> We can't copy memory-to-memory and retain the resultant databases into our
> spawned workers since spawned child processes don't share an environment
> with parent processes.
>
> There are two possible workarounds:
>                                      1. To copy our in-memory database
> into an on disk database using VACUUM INTO and subsequently restore it into
> in-memory during worker initialization
>                                      2. Pass a string query resulting from
> calling Connection.iterdump() into worker initialization and call that to
> restore copies of the database into in-memory for every child process.
>
> I'm testing out both now to see if there are any major differences. There
> might be a performance difference between them. I'm also checking if there
> are any unexpected errors that might arise.
>
> Regards,
> Ahmad
>
> On Sun, Mar 29, 2020 at 5:56 PM Adam Johnson <m...@adamj.eu> wrote:
>
>> Currently what I'm figuring out is getting a SQL dump to change SQLite's
>>> cloning method and implementing an Oracle cloning method. I'm searching
>>> through Django's documentation and internal code to see if there is a
>>> ready-made SQL dump method for SQLite and if not I'll search for it in a
>>> third-party library, and if I still don't find it, I'll start thinking
>>> about a ground-up implementation using the ORM.
>>>
>>
>> SQLite has an online backup API and the "VACUUM INTO" command both of
>> which can efficiently clone databases: https://sqlite.org/backup.html .
>> I think they can even copy memory-to-memory with the right arguments.
>>
>> On Sat, 28 Mar 2020 at 06:59, Ahmad A. Hussein <ahmadahusse...@gmail.com>
>> wrote:
>>
>>> Apologies for the late response. I've had to attend to personal matters
>>> concerning the current crisis we're all facing. All is well though
>>>
>>> I should have posted a more detailed post from the get-go. I apologize
>>> for the lack of clarity as well.
>>>
>>> Last week, I initially did exactly what you suggested. I called
>>> django.setup() in each child process during worker initialization. This
>>> fixed app registry issues but like you said it isn't enough for testing.
>>> Test apps and test models were missing and caused tons of errors. Later, I
>>> read through runtests.py and saw the setup method there; it was exactly
>>> what I needed since it searched for setup the correct template backend,
>>> searched for test apps and added them to installed apps, and called
>>> django.setup(). I passed that method along with its initial arguments into
>>> the runner so I could call it during worker initialization. That fixed all
>>> errors related to state initialization. I do wonder though if any
>>> meaningful time could be saved if we use a cache for setup instead of
>>> having to call the setup for each process.
>>>
>>> The last glaring hole was correct database connections. I had a naming
>>> mismatch with Postgres and I ended up fixing that through prepending
>>> "test_" in each cloned database's name during worker initialization. In
>>> case of the start method being fork, we can safely ignore that step and
>>> it'll work fine.
>>>
>>> Currently what I'm figuring out is getting a SQL dump to change SQLite's
>>> cloning method and implementing an Oracle cloning method. I'm searching
>>> through Django's documentation and internal code to see if there is a
>>> ready-made SQL dump method for SQLite and if not I'll search for it in a
>>> third-party library, and if I still don't find it, I'll start thinking
>>> about a ground-up implementation using the ORM.
>>>
>>> As for progress on the Oracle cloning method, I'm consulting Oracle
>>> documentation right now to see if anything has changed in the last 5 years.
>>> If I don't find anything interesting, I'll start toying around with Shai
>>> Berger's ideas to see what works and what's performance-costly.
>>>
>>> Lastly, testing does seem to be an enigma to think about right now. I've
>>> thought about tests for both SQLite's and Oracle's cloning method, but I
>>> can't imagine anything else.
>>>
>>> If you have any pointers, suggestions or feedback, I'd love to hear it!
>>> And thank you for your help so far.
>>>
>>>
>>> Regards,
>>> Ahmad
>>>
>>>
>>> On Thursday, March 26, 2020 at 10:39:28 AM UTC+2, Aymeric Augustin wrote:
>>>>
>>>> Hello Ahmad,
>>>>
>>>> I believe there's interest for supporting parallel test runs on Oracle,
>>>> if only to make Django's own test suite faster.
>>>>
>>>> I'm not very familiar with spawn vs. fork. As far as I understand,
>>>> spawn starts a new process, so you'll have to redo some initialization in
>>>> the child processes. In a regular application, this would mean calling
>>>> `django.setup()`. In Django's own test suite, it might be different; I
>>>> don't know. Try it and see what breaks, maybe?
>>>>
>>>> Hope this helps :-)
>>>>
>>>> --
>>>> Aymeric.
>>>>
>>>>
>>>>
>>>> On 23 Mar 2020, at 20:22, Ahmad A. Hussein <ahmadah...@gmail.com>
>>>> wrote:
>>>>
>>>> Django's parallel test runner works through forking processes, making
>>>> it incompatible on Windows by default and incompatible in macOS due to a
>>>> recent update. Windows and macOS both support spawn and have it enabled by
>>>> default. Databases are cloned for each worker.
>>>>
>>>> To switch from fork to spawn, state setup will be handled by spawned
>>>> processes instead of through inheritance via fork. Worker’s connections to
>>>> databases can still be done through get_test_db_clone_settings which
>>>> changes the names of databases assigned to a specific worker_id; however,
>>>> SQLite's cloning method is incompatible with spawn.
>>>>
>>>>
>>>> SQLite’s cloning method relies on it both being in-memory and fork as
>>>> when we fork the main process, we retain SQLite's in-memory database in the
>>>> child process. The solution is to get a SQL dump of the database and throw
>>>> it into the target cloned databases. This is also the established standard
>>>> in handling MySQL’s cloning process. Both Postgresql's and MySQL's cloning
>>>> methods are independent of fork or spawn and won't require any 
>>>> modification.
>>>>
>>>> Oracle has not been implemented in the parallel test runner originally
>>>> even on Linux and I propose to extend support to Oracle as well in my
>>>> proposal. I want to confirm if there is significant support behind this as
>>>> a feature or not before I commit to writing a specification, but as a
>>>> summary it is definitely possible as the work collaborated on by Aymeric
>>>> Augustin and Shai Berger show that cloning CAN be done through multiple
>>>> ideas. The reason why it's a headache is that Oracle does not support
>>>> separate databases under a single user- unlike our other supported
>>>> databases, so we can't clone databases without introducing another user.
>>>> Some methods may also need to be rewritten to accommodate for the Oracle
>>>> backend, but that isn't an issue. I've glossed over writing out a schedule
>>>> or a more detailed abstract as I I'm mainly posting here to see if there is
>>>> indeed support for the Oracle proposal and to make sure I am not missing
>>>> any details in regards to adapting the current parallel test runner to work
>>>> through spawn. Let me know what you think.
>>>>
>>>> Regards,
>>>> Ahmad
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Django developers (Contributions to Django itself)" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to django-d...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/django-developers/317f67c6-4b23-483f-ada5-9bdbb45d0997%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/django-developers/317f67c6-4b23-483f-ada5-9bdbb45d0997%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-developers+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/1e185df6-e6ea-4066-84b5-c2e0a150239a%40googlegroups.com
>>> <https://groups.google.com/d/msgid/django-developers/1e185df6-e6ea-4066-84b5-c2e0a150239a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> Adam
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-developers+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/CAMyDDM1YU51rQZHARD0NRNKDyY0mEnt8XDPWAyyp%3DbPeT0bsqg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/django-developers/CAMyDDM1YU51rQZHARD0NRNKDyY0mEnt8XDPWAyyp%3DbPeT0bsqg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAJNa-uNa8rCj_d%2BLt00HceLP97QmbPM6UTRV%3DqpAmmgu8pC23w%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAJNa-uNa8rCj_d%2BLt00HceLP97QmbPM6UTRV%3DqpAmmgu8pC23w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/5217A362-5132-4319-9F1F-E8EBA049CB4E%40tomforb.es
> <https://groups.google.com/d/msgid/django-developers/5217A362-5132-4319-9F1F-E8EBA049CB4E%40tomforb.es?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAJNa-uPOr_v8ggOCyNh09NARSWXFf-gGvbnHoG3nSxft_SbHZg%40mail.gmail.com.

Re: GSoC Proposal to extend the parallel test runner to Windows and macOS and to support Oracle parallel test running

Reply via email to