Non-relational database support for the Django ORM
==================================================

Note:  I am withdrawing my proposal on template compilation.  Another student
has expressed some interest in working on it, and in any event I am now more
interested in working on this project.

About Me
~~~~~~~~

I'm a sophomore computer science student at Rensselaer Polytechnic Institute.
I'm a frequent contributor to Django (including last year's successful multiple
database GSoC project) and other related projects; I'm also a committer on both
`Unladen Swallow <http://code.google.com/p/unladen-swallow/>`_ and
`PyPy <http://codespeak.net/pypy/dist/pypy/doc/>`_.

Background
~~~~~~~~~~

As the person responsible for large swaths of multiple database support I am
intimately familiar with the architecture of the ORM, the code itself, and the
various concerns that need to be accounted for (pickleability, etc.).

Rationale
~~~~~~~~~

Non-relational databases tend to support some subset of the operations that are
supported on relational databases, therefore it should be possible to perform
these operations on all databases.  Some people are of the opinion that we
shouldn't bother to support these databases, because they can't perform all
operations, I'm of the opinion that the abstraction is already a little leaky,
we may as well exploit this for a common API where possible, as well as giving
users of these databases the admin and models forms for free.

Method
~~~~~~

The ORM architecture currently has a ``QuerySet`` which is backend agnostic, a
``Query`` which is SQL specific, and a ``SQLCompiler`` which is backend
specific (i.e. Oracle vs. MySQL vs. generic).  The plan is to change ``Query``
to be backend agnostic by delaying the creation of structures that are SQL
specific, specifically join/alias data.  Instead of structures like
``self.where``, ``self.join_aliases``, or ``self.select`` all working in terms
of joins and table aliases the composition of a query would be stored in terms
of a tree containing the "raw" filters, as passed to the filter calls, with
things like ``Field.get_prep_value`` called appropriately.  The ``SQLCompiler``
will be responsible for computing the joins for all of these data-structures.

The major complications are operations where ordering matters, for example
``filter()`` and ``annotate()``.  Because the order of these operations matters
it is imperative that the structures continue to maintain the ordered semantics
of these methods.  Another example is that filters across a many valued
relationship have different semantics when they're in the same call to
``filter()`` as opposed to separate calls.  In the current ``Query`` this is
represented by using different table aliases, however because the new structure
doesn't deal in aliases yet all values should be annotated with a table
"counter" indicating that once joins are computed two different values need to
be on the same join.  This is a bit of a leaky abstraction, but that's life.
It should be noted that joins don't have to be explicitly marked as being
different, only the same (i.e. the ``SQLCompiler`` can choose to reuse,
reorder, or do anything else it likes to efficiently generate SQL).

For operations that aren't supported by a backend (i.e. a JOIN on a
non-relational backend, or ``extra`` SQL on non-SQL backends) it is the
backend's responsibility to raise the appropriate exception (or attempt to
emulate it in some way (e.g. some JOINs can be emulated with nested IN
queries)).

Timeline
~~~~~~~~

This timeline is way coarser than I'd like, consider it a work in progress.

 * 2 weeks - update all ``Query`` methods to store data in a backend agnostic
   manner.
 * 4 weeks - update ``SQLCompiler`` to correctly generate SQL from the
   structures, specifically migrate the JOIN generation logic.
 * 2 weeks - begin working on a backend for a non-relational database (probably
   MongoDB)
 * 3 weeks - deal with bugs as they come up, these will mostly be
related to the
   semantics of inserts and updates at a guess.

Deliverables
~~~~~~~~~~~~

 * Refactored ORM ``Query`` and ``SQLCompiler`` classes.
 * A working MongoDB backend (to live outside of the core) supporting:
   * Native lookups (MongoDB supports most "basic" lookup types)
   * Creation/update
   * deletion
   * Working forms (should fall out naturally)

Reality
~~~~~~~

All applications aren't magically going to start working on database they
weren't designed to work with.  Using a non-relational database requires a
fundamental change of mindset, the point of this is to be able to use the same
API where possible, and get access to things like the admin and forms.

A note on the admin
~~~~~~~~~~~~~~~~~~~

The admin's fundamental operations are list, create, update.  Fundamentally
these should fall out, naturally, for all backends that work.  However, there
are some operations that can subtly require more advanced backend
operations.  Specifically, ``list_filter`` and ``search_fields``
require backends that
support for the queries that they generate.  In cases where a user tries to use
these features with a backend that doesn't support them the expected result is
for the backend to raise an exception.  This isn't a great user-interface, but
the admin attempting to query the backend for this information results in both
code bloat, and a terrible dependency inversion (i.e. the backend should be
responsible for knowing what operations it can perform).  Ultimately this is a
case where it is the developer's responsibility to know what they are doing.


Comments, criticism, Nobel prize nominations, and letter bombs welcome,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to