Hey Alex, On Apr 7, 2:11 am, Alex Gaynor <alex.gay...@gmail.com> wrote: > Non-relational database support for the Django ORM > ================================================== > > Note: I am withdrawing my proposal on template compilation. Another > student > has expressed some interest in working on it, and in any event I am > now more > interested in working on this project.
It's great that you want to work on this project. Since I want to see this feature in Django, I'm offering mentoring help with the NoSQL part. You know Django's ORM better than me, so I probably can't really help you there, but I can help to make sure that your modifications will work well on NoSQL DBs. Just in case this is necessary, I'll apply as a GSoC mentor before it's too late (if I remember correctly, in 2007 we could still allow new mentors even at this late stage)? > Method > ~~~~~~ > > The ORM architecture currently has a ``QuerySet`` which is backend > agnostic, a > ``Query`` which is SQL specific, and a ``SQLCompiler`` which is > backend > specific (i.e. Oracle vs. MySQL vs. generic). The plan is to change > ``Query`` > to be backend agnostic by delaying the creation of structures that are > SQL > specific, specifically join/alias data. Instead of structures like > ``self.where``, ``self.join_aliases``, or ``self.select`` all working > in terms > of joins and table aliases the composition of a query would be stored > in terms > of a tree containing the "raw" filters, as passed to the filter calls, > with > things like ``Field.get_prep_value`` called appropriately. The > ``SQLCompiler`` > will be responsible for computing the joins for all of these data- > structures. Could you please elaborate on the data structures? In the end, non- relational backends shouldn't have to reproduce large parts of the SQLQuery code just to emulate a JOIN. When we tried to do a similar refactoring we quickly faced the problem that we needed something similar to setup_joins() and other SQLQuery features. We'd also have to create code for grouping filters into individual queries on tables. The Query class should take care of as much of the common stuff as possible, so nonrel backends can potentially emulate every single SQL feature (e.g., via MapReduce or whatever) with the least effort. Otherwise this refactoring would actually have more disadvantages than our current SQLCompiler-based approach in Django-nonrel (as ridiculous as that sounds). However, it's important that all of the emulated features are handled not by the backend, but by a reusable code layer which sits on top of the nonrel backends. It would be wasteful to let every backend developer write his own JOIN emulation and denormalization and aggregate code, etc.. The refactored ORM should at least still allow for writing some kind of "proxy" backend that sits on top of the actual nonrel backend and takes care of SQL features emulation. I'm not sure if it's a good idea to integrate the emulation into Django itself because then progress will be slowed down. Ideally, we should provide a simplified API for nonrel backends, similar to the one that we recently published for Django-nonrel, so a backend could be written in two days instead of two weeks. We can port our work over to the refactored ORM, so this you don't have to deal with this (except if it should be officially integrated into Django). In addition to these changes you'll also need to take care of a few other things: Many NoSQL DBs provide a simple "upsert"-like behavior where on save() they either create a new entity if none exists with that primary key or update the existing entity if one exists. However, on save() Django first checks if an entity exists. This would be inefficient and unnecessary, so the backend should be able to turn that behavior off. On delete() Django also deletes related objects. This can be a costly operation, especially if you have a large number of entities. Also, the queries that collect the related entities can conflict with transaction support at least on App Engine and it might also be very inefficient on HBase. IOW, it's not sufficient to let the user handle the deletion for large datasets. So, non-relational (and maybe also relatinoal) DBs should be able to defer and split up the deletion process into background tasks - which would also simplify the developer's job because he doesn't have to take care of manually writing background tasks for large datasets, so it's a good addition in general. I'm not sure how to handle multi-table inheritance. It could be done with JOIN emulation, but this would be very inefficient. Denormalization is IMHO not the answer to this problem, either. Should Django simply fail to execute such a query on those backends or should the user make sure that he doesn't use multi-table inheritance unnecessarily in his code? Bye, Waldemar Kornewald -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.