Re: [GSOC] NoSQL Support for the ORM

Alex Gaynor Wed, 07 Apr 2010 08:22:45 -0700

On Wed, Apr 7, 2010 at 6:47 AM, Russell Keith-Magee
<freakboy3...@gmail.com> wrote:
> On Wed, Apr 7, 2010 at 8:11 AM, Alex Gaynor <alex.gay...@gmail.com> wrote:
>> Non-relational database support for the Django ORM
>> ==================================================
>>
>> Note:  I am withdrawing my proposal on template compilation.  Another student
>> has expressed some interest in working on it, and in any event I am now more
>> interested in working on this project.
>>
>> About Me
>> ~~~~~~~~
>>
>> I'm a sophomore computer science student at Rensselaer Polytechnic Institute.
>> I'm a frequent contributor to Django (including last year's successful 
>> multiple
>> database GSoC project) and other related projects; I'm also a committer on 
>> both
>> `Unladen Swallow <http://code.google.com/p/unladen-swallow/>`_ and
>> `PyPy <http://codespeak.net/pypy/dist/pypy/doc/>`_.
>>
>> Background
>> ~~~~~~~~~~
>>
>> As the person responsible for large swaths of multiple database support I am
>> intimately familiar with the architecture of the ORM, the code itself, and 
>> the
>> various concerns that need to be accounted for (pickleability, etc.).
>>
>> Rationale
>> ~~~~~~~~~
>>
>> Non-relational databases tend to support some subset of the operations that 
>> are
>> supported on relational databases, therefore it should be possible to perform
>> these operations on all databases.  Some people are of the opinion that we
>> shouldn't bother to support these databases, because they can't perform all
>> operations, I'm of the opinion that the abstraction is already a little 
>> leaky,
>> we may as well exploit this for a common API where possible, as well as 
>> giving
>> users of these databases the admin and models forms for free.
>>
>> Method
>> ~~~~~~
>>
>> The ORM architecture currently has a ``QuerySet`` which is backend agnostic, 
>> a
>> ``Query`` which is SQL specific, and a ``SQLCompiler`` which is backend
>> specific (i.e. Oracle vs. MySQL vs. generic).  The plan is to change 
>> ``Query``
>> to be backend agnostic by delaying the creation of structures that are SQL
>> specific, specifically join/alias data.  Instead of structures like
>> ``self.where``, ``self.join_aliases``, or ``self.select`` all working in 
>> terms
>> of joins and table aliases the composition of a query would be stored in 
>> terms
>> of a tree containing the "raw" filters, as passed to the filter calls, with
>> things like ``Field.get_prep_value`` called appropriately.  The 
>> ``SQLCompiler``
>> will be responsible for computing the joins for all of these data-structures.
>
> I can see the intention here, and I can see how this approach could be
> used to solve the problem. However, my initial concern is that normal
> SQL users will end up carrying around a lot of extra overhead so that
> they can support backends that they will never use.
>
> Have you given any thought to how complex the datastructures inside
> Query will need to be, and how complex and/or expensive the conversion
> process will be?
>


I see no reason they need to be any more complex than the current
ones.  You have a tree that represents filters (combined where and
having, this means that the SQLCompiler is responsible for splitting
these up, which I think will make fixing some other bugs easier (i.e.
disjunction with a filter on aggregates currently doesn't work)).
There's already quite a lot of stuff that's computed later, such as
select_related's transformation into JOINs.

> Other issues that spring to mind:
>
>  * What about nonSQL datatypes? List/Set types are a common feature of
> Non-SQL backends, and are The Right Way to solve a whole bunch of
> problems. How do you propose to approach these datatypes? What (if
> any) overlap exists between the use of set data types and m2m? Is
> there any potential overlap between supporting List/Set types and
> supporting Arrays in SQL?
>

Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
opinion there's no reason, once we have a good clean seperation of
concerns in the architecture that implementing a ListField would be
particularly hard.  If we happened to include one in Django, all the
better (from the perspective of interoperability).

>  * How does a non-SQL backend integrate with syncdb and other setup
> tools? What about inspectdb?
>

Most, but not all non-relational databases don't require table setup
the way relational DBs do.  MongoDB doesn't require anything at all,
by contrast Cassandra requires an XML configuration file.  How to
handle these is a little touchy, but basically I think syncdb should
stay conceptually pure, generating "tables", if extra config is needed
backends should ship custom management commands.

As for inspectdb it only really makes sense on backends that have
structured "tables", so they could implement it, and other backends
could punt.

>  * What about basic connection management? Is the existing Connection
> API likely to be compatible, or will modifications be required?
>

No, it's not.  non-relational databases aren't bound by PEP-249 and
thus have wildly incompatible APIs.  However, we cheated a bit with
multi-db.   Compilers are responsible for actually executing their
queries, so they can already deal with the inconsistancies here.

>  * Why the choice of MongoDB specifically? Do you have particular
> experience with MongoDB? Does MongoDB have features that make it a
> good choice?
>

MongoDB offers a wide range of filtering options, which from my
perspective means it presents a greater test of the flexibility of the
developed APIs.  For this reason GAE would also be a good choice.
Something like Riak or Cassandra, which basically only have native
support for get(pk=3) would be a poor test of the flexibility of the
API.

>  * Given that you're only proposing a single proof-of-concept backend,
> have you given any thought to the needs of other backends? It's not
> hard to envisage that Couch, Cassandra, GAE etc will all have slightly
> different requirements and problems. Is there a common ground that
> exists between all data store backends? If there isn't, how do you
> know that what you are proposing will be sufficient to support them?
>

To a certain extent this is a matter of knowing the featuresets of the
databases and, hopefully, having a mentor who is knowledgeable about
them.  The reality is under the GSOC time constraints attempting to
write complete backends for multiple databases would probably be
impossible.

> There's also the issue of specificity in your proposal; I'll take you
> at your word that what you have proposed is a draft that requires
> elaboration.
>
> Yours,
> Russ Magee %-)
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To post to this group, send email to django-develop...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.
>
>

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: [GSOC] NoSQL Support for the ORM

Reply via email to