Re: GSoC Proposal to extend the parallel test runner to Windows and macOS and to support Oracle parallel test running

Ahmad A. Hussein Mon, 30 Mar 2020 05:13:51 -0700

You're definitely right!

I just finished rewriting it to re-run migrations and setup in each worker.
I didn't bench mark it so I'll do that again later, but as a gut reaction
it isn't significantly slower; but of course the numbers will tell the real
story.


Passing the dump string through shared memory or converting in-memory to on
disk seems way too overkill now. The performance drop probably won't be
significant especially since this is SQLite.

I'll update you here with the exact benchmarks later. I'm busy now with
school work, but I'll get back to it again later today.

It is definitely exciting to find out!

Regards,
Ahmad

On Mon, Mar 30, 2020 at 11:46 AM Tom Forbes <t...@tomforb.es> wrote:

> Awesome! However there is always a tradeoff between speed and code
> complexity - it might not be worth a more complex system involving shared
> memory or copying SQLite files in other ways if it only saves ~5 seconds at
> the start of a test run compared to re-running the migrations in each
> worker. I guess this might be unique to SQLite where things are generally
> faster.
>
> Just something to keep in mind, I’m excited to see the results though!
>
> Tom
>
> On 30 Mar 2020, at 10:34, Ahmad A. Hussein <ahmadahusse...@gmail.com>
> wrote:
>
>  I did stumble upon shared memory when I originally wanted to pass the
> database connection itself into shared memory. Your idea of passing the
> query string is MUCH more reasonable, can be done using shared memory, and
> will ultimately be much more cleanly written.
>
> i'll definitely test out if it has significant performance boost and see
> how it compares to other methods.
>
> I did consider copying the SQLite database file itself. That's essentially
> what's happening with VACUUM INTO. It's a SQLite command that lets you
> clone a source database whether in-memory or on disk into a database file.
> I got the parallel test runner to work with these database files, creating
> a database file for each worker. I think there's a performance boost to
> have if we make the databases for workers in-memory, especially if the test
> suite and tables grow in number; however, we also need to consider the time
> it takes to make the databases in-memory.
>
> Re-running migrations and recreating the SQLite database from scratch for
> every worker might be a time-sink, but it could actually prove quicker than
> other methods. I'll try it out as well and see.
>
> Thank you for your suggestions
>
> Regards,
> Ahmad
>
> On Mon, Mar 30, 2020 at 9:39 AM Tom Forbes <t...@tomforb.es> wrote:
>
>> There is an interesting addition to the standard library in Python 3.8:
>> multiprocessing.shared_memory (
>> https://docs.python.org/3/library/multiprocessing.shared_memory.html).
>> It’s only on 3.8 but it might be worth seeing if that has a performance
>> improvement over passing a large string into the workers init process.
>>
>> Have you also considered simply re-running the migrations and creating
>> the SQLite database from scratch in each worker, or simply copying the
>> SQLite database file itself?
>>
>> On 30 Mar 2020, at 08:13, Ahmad A. Hussein <ahmadahusse...@gmail.com>
>> wrote:
>>
>> 
>> Thank you Adam for the amazing pointer. That's exactly the solution I was
>> looking for.
>>
>> I didn't even know VACUUM INTO was a thing in SQLite3 (quite nifty). Both
>> sources you listed can definitely copy memory-to-memory efficiently. I had
>> also found two useful APIs in the standard library's documentation:
>> iterdump() and backup(). The latter is an attempt at recreating the online
>> backup API you linked.
>>
>> I did face an issue in my implementation though. Memory-to-memory copying
>> is impossible under the current conditions.
>>
>> We can't copy memory-to-memory and retain the resultant databases into
>> our spawned workers since spawned child processes don't share an
>> environment with parent processes.
>>
>> There are two possible workarounds:
>>                                      1. To copy our in-memory database
>> into an on disk database using VACUUM INTO and subsequently restore it into
>> in-memory during worker initialization
>>                                      2. Pass a string query resulting
>> from calling Connection.iterdump() into worker initialization and call that
>> to restore copies of the database into in-memory for every child process.
>>
>> I'm testing out both now to see if there are any major differences. There
>> might be a performance difference between them. I'm also checking if there
>> are any unexpected errors that might arise.
>>
>> Regards,
>> Ahmad
>>
>> On Sun, Mar 29, 2020 at 5:56 PM Adam Johnson <m...@adamj.eu> wrote:
>>
>>> Currently what I'm figuring out is getting a SQL dump to change SQLite's
>>>> cloning method and implementing an Oracle cloning method. I'm searching
>>>> through Django's documentation and internal code to see if there is a
>>>> ready-made SQL dump method for SQLite and if not I'll search for it in a
>>>> third-party library, and if I still don't find it, I'll start thinking
>>>> about a ground-up implementation using the ORM.
>>>>
>>>
>>> SQLite has an online backup API and the "VACUUM INTO" command both of
>>> which can efficiently clone databases: https://sqlite.org/backup.html .
>>> I think they can even copy memory-to-memory with the right arguments.
>>>
>>> On Sat, 28 Mar 2020 at 06:59, Ahmad A. Hussein <ahmadahusse...@gmail.com>
>>> wrote:
>>>
>>>> Apologies for the late response. I've had to attend to personal matters
>>>> concerning the current crisis we're all facing. All is well though
>>>>
>>>> I should have posted a more detailed post from the get-go. I apologize
>>>> for the lack of clarity as well.
>>>>
>>>> Last week, I initially did exactly what you suggested. I called
>>>> django.setup() in each child process during worker initialization. This
>>>> fixed app registry issues but like you said it isn't enough for testing.
>>>> Test apps and test models were missing and caused tons of errors. Later, I
>>>> read through runtests.py and saw the setup method there; it was exactly
>>>> what I needed since it searched for setup the correct template backend,
>>>> searched for test apps and added them to installed apps, and called
>>>> django.setup(). I passed that method along with its initial arguments into
>>>> the runner so I could call it during worker initialization. That fixed all
>>>> errors related to state initialization. I do wonder though if any
>>>> meaningful time could be saved if we use a cache for setup instead of
>>>> having to call the setup for each process.
>>>>
>>>> The last glaring hole was correct database connections. I had a naming
>>>> mismatch with Postgres and I ended up fixing that through prepending
>>>> "test_" in each cloned database's name during worker initialization. In
>>>> case of the start method being fork, we can safely ignore that step and
>>>> it'll work fine.
>>>>
>>>> Currently what I'm figuring out is getting a SQL dump to change
>>>> SQLite's cloning method and implementing an Oracle cloning method. I'm
>>>> searching through Django's documentation and internal code to see if there
>>>> is a ready-made SQL dump method for SQLite and if not I'll search for it in
>>>> a third-party library, and if I still don't find it, I'll start thinking
>>>> about a ground-up implementation using the ORM.
>>>>
>>>> As for progress on the Oracle cloning method, I'm consulting Oracle
>>>> documentation right now to see if anything has changed in the last 5 years.
>>>> If I don't find anything interesting, I'll start toying around with Shai
>>>> Berger's ideas to see what works and what's performance-costly.
>>>>
>>>> Lastly, testing does seem to be an enigma to think about right now.
>>>> I've thought about tests for both SQLite's and Oracle's cloning method, but
>>>> I can't imagine anything else.
>>>>
>>>> If you have any pointers, suggestions or feedback, I'd love to hear it!
>>>> And thank you for your help so far.
>>>>
>>>>
>>>> Regards,
>>>> Ahmad
>>>>
>>>>
>>>> On Thursday, March 26, 2020 at 10:39:28 AM UTC+2, Aymeric Augustin
>>>> wrote:
>>>>>
>>>>> Hello Ahmad,
>>>>>
>>>>> I believe there's interest for supporting parallel test runs on
>>>>> Oracle, if only to make Django's own test suite faster.
>>>>>
>>>>> I'm not very familiar with spawn vs. fork. As far as I understand,
>>>>> spawn starts a new process, so you'll have to redo some initialization in
>>>>> the child processes. In a regular application, this would mean calling
>>>>> `django.setup()`. In Django's own test suite, it might be different; I
>>>>> don't know. Try it and see what breaks, maybe?
>>>>>
>>>>> Hope this helps :-)
>>>>>
>>>>> --
>>>>> Aymeric.
>>>>>
>>>>>
>>>>>
>>>>> On 23 Mar 2020, at 20:22, Ahmad A. Hussein <ahmadah...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Django's parallel test runner works through forking processes, making
>>>>> it incompatible on Windows by default and incompatible in macOS due to a
>>>>> recent update. Windows and macOS both support spawn and have it enabled by
>>>>> default. Databases are cloned for each worker.
>>>>>
>>>>> To switch from fork to spawn, state setup will be handled by spawned
>>>>> processes instead of through inheritance via fork. Worker’s connections to
>>>>> databases can still be done through get_test_db_clone_settings which
>>>>> changes the names of databases assigned to a specific worker_id; however,
>>>>> SQLite's cloning method is incompatible with spawn.
>>>>>
>>>>>
>>>>> SQLite’s cloning method relies on it both being in-memory and fork as
>>>>> when we fork the main process, we retain SQLite's in-memory database in 
>>>>> the
>>>>> child process. The solution is to get a SQL dump of the database and throw
>>>>> it into the target cloned databases. This is also the established standard
>>>>> in handling MySQL’s cloning process. Both Postgresql's and MySQL's cloning
>>>>> methods are independent of fork or spawn and won't require any 
>>>>> modification.
>>>>>
>>>>> Oracle has not been implemented in the parallel test runner originally
>>>>> even on Linux and I propose to extend support to Oracle as well in my
>>>>> proposal. I want to confirm if there is significant support behind this as
>>>>> a feature or not before I commit to writing a specification, but as a
>>>>> summary it is definitely possible as the work collaborated on by Aymeric
>>>>> Augustin and Shai Berger show that cloning CAN be done through multiple
>>>>> ideas. The reason why it's a headache is that Oracle does not support
>>>>> separate databases under a single user- unlike our other supported
>>>>> databases, so we can't clone databases without introducing another user.
>>>>> Some methods may also need to be rewritten to accommodate for the Oracle
>>>>> backend, but that isn't an issue. I've glossed over writing out a schedule
>>>>> or a more detailed abstract as I I'm mainly posting here to see if there 
>>>>> is
>>>>> indeed support for the Oracle proposal and to make sure I am not missing
>>>>> any details in regards to adapting the current parallel test runner to 
>>>>> work
>>>>> through spawn. Let me know what you think.
>>>>>
>>>>> Regards,
>>>>> Ahmad
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Django developers (Contributions to Django itself)" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to django-d...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/django-developers/317f67c6-4b23-483f-ada5-9bdbb45d0997%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/django-developers/317f67c6-4b23-483f-ada5-9bdbb45d0997%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>>
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Django developers (Contributions to Django itself)" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to django-developers+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/django-developers/1e185df6-e6ea-4066-84b5-c2e0a150239a%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/django-developers/1e185df6-e6ea-4066-84b5-c2e0a150239a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Adam
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-developers+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/CAMyDDM1YU51rQZHARD0NRNKDyY0mEnt8XDPWAyyp%3DbPeT0bsqg%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/django-developers/CAMyDDM1YU51rQZHARD0NRNKDyY0mEnt8XDPWAyyp%3DbPeT0bsqg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-developers+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/CAJNa-uNa8rCj_d%2BLt00HceLP97QmbPM6UTRV%3DqpAmmgu8pC23w%40mail.gmail.com
>> <https://groups.google.com/d/msgid/django-developers/CAJNa-uNa8rCj_d%2BLt00HceLP97QmbPM6UTRV%3DqpAmmgu8pC23w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-developers+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/5217A362-5132-4319-9F1F-E8EBA049CB4E%40tomforb.es
>> <https://groups.google.com/d/msgid/django-developers/5217A362-5132-4319-9F1F-E8EBA049CB4E%40tomforb.es?utm_medium=email&utm_source=footer>
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAJNa-uPOr_v8ggOCyNh09NARSWXFf-gGvbnHoG3nSxft_SbHZg%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAJNa-uPOr_v8ggOCyNh09NARSWXFf-gGvbnHoG3nSxft_SbHZg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/8743CDFB-32AF-477F-8AE5-514A8E953159%40tomforb.es
> <https://groups.google.com/d/msgid/django-developers/8743CDFB-32AF-477F-8AE5-514A8E953159%40tomforb.es?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAJNa-uOi-6NYi7i_QN9axQidwWd%3Dqq924afcWHHihvHnXd%3DpHw%40mail.gmail.com.

Re: GSoC Proposal to extend the parallel test runner to Windows and macOS and to support Oracle parallel test running

Reply via email to