Non-relational database support for the Django ORM ==================================================
Note: I am withdrawing my proposal on template compilation. Another student has expressed some interest in working on it, and in any event I am now more interested in working on this project. About Me ~~~~~~~~ I'm a sophomore computer science student at Rensselaer Polytechnic Institute. I'm a frequent contributor to Django (including last year's successful multiple database GSoC project) and other related projects; I'm also a committer on both `Unladen Swallow <http://code.google.com/p/unladen-swallow/>`_ and `PyPy <http://codespeak.net/pypy/dist/pypy/doc/>`_. Background ~~~~~~~~~~ As the person responsible for large swaths of multiple database support I am intimately familiar with the architecture of the ORM, the code itself, and the various concerns that need to be accounted for (pickleability, etc.). Rationale ~~~~~~~~~ Non-relational databases tend to support some subset of the operations that are supported on relational databases, therefore it should be possible to perform these operations on all databases. Some people are of the opinion that we shouldn't bother to support these databases, because they can't perform all operations, I'm of the opinion that the abstraction is already a little leaky, we may as well exploit this for a common API where possible, as well as giving users of these databases the admin and models forms for free. Method ~~~~~~ The ORM architecture currently has a ``QuerySet`` which is backend agnostic, a ``Query`` which is SQL specific, and a ``SQLCompiler`` which is backend specific (i.e. Oracle vs. MySQL vs. generic). The plan is to change ``Query`` to be backend agnostic by delaying the creation of structures that are SQL specific, specifically join/alias data. Instead of structures like ``self.where``, ``self.join_aliases``, or ``self.select`` all working in terms of joins and table aliases the composition of a query would be stored in terms of a tree containing the "raw" filters, as passed to the filter calls, with things like ``Field.get_prep_value`` called appropriately. The ``SQLCompiler`` will be responsible for computing the joins for all of these data-structures. The major complications are operations where ordering matters, for example ``filter()`` and ``annotate()``. Because the order of these operations matters it is imperative that the structures continue to maintain the ordered semantics of these methods. Another example is that filters across a many valued relationship have different semantics when they're in the same call to ``filter()`` as opposed to separate calls. In the current ``Query`` this is represented by using different table aliases, however because the new structure doesn't deal in aliases yet all values should be annotated with a table "counter" indicating that once joins are computed two different values need to be on the same join. This is a bit of a leaky abstraction, but that's life. It should be noted that joins don't have to be explicitly marked as being different, only the same (i.e. the ``SQLCompiler`` can choose to reuse, reorder, or do anything else it likes to efficiently generate SQL). For operations that aren't supported by a backend (i.e. a JOIN on a non-relational backend, or ``extra`` SQL on non-SQL backends) it is the backend's responsibility to raise the appropriate exception (or attempt to emulate it in some way (e.g. some JOINs can be emulated with nested IN queries)). Timeline ~~~~~~~~ This timeline is way coarser than I'd like, consider it a work in progress. * 2 weeks - update all ``Query`` methods to store data in a backend agnostic manner. * 4 weeks - update ``SQLCompiler`` to correctly generate SQL from the structures, specifically migrate the JOIN generation logic. * 2 weeks - begin working on a backend for a non-relational database (probably MongoDB) * 3 weeks - deal with bugs as they come up, these will mostly be related to the semantics of inserts and updates at a guess. Deliverables ~~~~~~~~~~~~ * Refactored ORM ``Query`` and ``SQLCompiler`` classes. * A working MongoDB backend (to live outside of the core) supporting: * Native lookups (MongoDB supports most "basic" lookup types) * Creation/update * deletion * Working forms (should fall out naturally) Reality ~~~~~~~ All applications aren't magically going to start working on database they weren't designed to work with. Using a non-relational database requires a fundamental change of mindset, the point of this is to be able to use the same API where possible, and get access to things like the admin and forms. A note on the admin ~~~~~~~~~~~~~~~~~~~ The admin's fundamental operations are list, create, update. Fundamentally these should fall out, naturally, for all backends that work. However, there are some operations that can subtly require more advanced backend operations. Specifically, ``list_filter`` and ``search_fields`` require backends that support for the queries that they generate. In cases where a user tries to use these features with a backend that doesn't support them the expected result is for the backend to raise an exception. This isn't a great user-interface, but the admin attempting to query the backend for this information results in both code bloat, and a terrible dependency inversion (i.e. the backend should be responsible for knowing what operations it can perform). Ultimately this is a case where it is the developer's responsibility to know what they are doing. Comments, criticism, Nobel prize nominations, and letter bombs welcome, Alex -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.