On Wed, Apr 7, 2010 at 4:43 PM, Waldemar Kornewald <wkornew...@gmail.com> wrote: > On Wed, Apr 7, 2010 at 5:12 PM, Alex Gaynor <alex.gay...@gmail.com> wrote: >> No. I am vehemently opposed to attempting to extensively emulate the >> features of a relational database in a non-relational one. People >> talk about the "object relational" impedance mismatch, much less the >> "object-relational non-relational" one. I have no interest in >> attempting to support any attempts at emulating features that just >> don't exist on the databases they're being emulated on. > > This decision has to be based on the actual needs of NoSQL developers. > Did you actually work on non-trivial projects that needed > denormalization and in-memory JOINs and manually maintained counters? > I'm not making this up. The "dumb" key-value store API is not enough. > People are manually writing lots of code for features that could be > handled by an SQL emulation layer. Do we agree until here? >
No, we don't. People are desiging there data in ways that fit their datastore. If all people did was implement a relational model in userland code on top of non-relational databases then they'd really be missing the point. > Then, the question boils down to: Is the ORM the right place to handle > those features? > > We see more advantages in moving those features into the ORM instead > of some separate API: > No matter whether you do denormalization or an in-memory JOIN, you end > up emulating an SQL-like JOIN. When you're maintaining a counter you > again do a simple and very common operation supported by SQL: > counting. Django's ORM already provides that functionality. Django's > current reusable apps already use that functionality. Developers > already know Django's ORM and thus also that functionality. By moving > these features into the ORM > * existing Django apps will either work directly on NoSQL or at least > be much easier to port Not a design concern. People expecting programs designed for totally separate data models to work should expect to be disappointed. Unless you're using the limited subset of features supported by all databases, of course. > * Django apps written for NoSQL will be portable across all NoSQL DBs > without any code changes and in the worst case require only minor > changes to switch to SQL > * the resulting code is shorter and easier to understand than with a > separate API which would only add another layer of indirection you'd > have to think about *every* (!) single time you work with models (and > if you have to think about this while writing model code you end up > with potentially a lot more bugs, as is actually the case in practice) > * developers won't have to use and learn a different models API (you'd > only need to learn an API for specifying "optimization" rules, but the > models would still be the same) > Uhh, the whole point of htis is that there is only a single API. > App Engine's indexes are not that different from what we propose. Like > many other NoSQL DBs, the datastore doesn't create indexes for all > possible queries. Sometimes you'll need a composite index to make > certain queries work. On Cassandra, CouchDB, Redis, and many other > "crippled" NoSQL DBs you solve this problem by maintaining even the > most trivial DB indexes with manually written indexing *code* (and I > mean *anything* that filters on fields other than the primary key). I > bet five years ago database developers would've called anyone nuts > who'd seriously suggest that nonsense, but somehow the NoSQL hype > makes developers forget about productivity. Anyway, on App Engine, > instead of writing code for those trivial indexes you add a simple > index definition to your index.yaml (actually, it's automatically > generated for you based on the queries you execute) and suddenly the > normal query API supports the respective filter rules transparently > (with exactly the same API; this is in strong contrast to Cassandra, > etc. which also make you manually write code for traversing those > manually implemented indexes! basically, they make you implement a > little specialized DB for every project and this is no joke, but the > sad truth). Now, our goal is to bring App Engine's indexing > definitions to the next level and allow to specify denormalization and > other "advanced" indexing rules which make more complicated queries > work transparently, again via the same API that everyone already > knows. > > Instead of seeing this as object-relational non-relational mapping you > should see this as an object-relational mapping for a type of DB that > needs explicitly specified indexing rules for complex queries (which, > if you really think about it, exactly describes what working with > NoSQL DBs is like). > >>> In addition to these changes you'll also need to take care of a few >>> other things: >>> >>> Many NoSQL DBs provide a simple "upsert"-like behavior where on save() >>> they either create a new entity if none exists with that primary key >>> or update the existing entity if one exists. However, on save() Django >>> first checks if an entity exists. This would be inefficient and >>> unnecessary, so the backend should be able to turn that behavior off. >>> >>> On delete() Django also deletes related objects. This can be a costly >>> operation, especially if you have a large number of entities. Also, >>> the queries that collect the related entities can conflict with >>> transaction support at least on App Engine and it might also be very >>> inefficient on HBase. IOW, it's not sufficient to let the user handle >>> the deletion for large datasets. So, non-relational (and maybe also >>> relatinoal) DBs should be able to defer and split up the deletion >>> process into background tasks - which would also simplify the >>> developer's job because he doesn't have to take care of manually >>> writing background tasks for large datasets, so it's a good addition >>> in general. >>> >> >> There is seperate work on another ticket to provide a way to declare >> ON_DELETE behavior, though this is a bit of a relational concept it >> seems to me making these easy to customize provides a good way for >> different backends to specify their behavior here. > > Hmm, I'm not sure. The requirement is that this works transparently on > all DBs (without manually changing ForeignKeys). The proposed setting > ON_DELETE_HANDLED_BY_DB comes close, but it's still not the same > because we still need Django's code for collecting the related objects > (just at a later point and in groups of maybe 100 entities, so it can > be distributed across multiple background task runs). > >>> I'm not sure how to handle multi-table inheritance. It could be done >>> with JOIN emulation, but this would be very inefficient. >>> Denormalization is IMHO not the answer to this problem, either. Should >>> Django simply fail to execute such a query on those backends or should >>> the user make sure that he doesn't use multi-table inheritance >>> unnecessarily in his code? >>> >> >> There's nothing about MTI that's inherently hard on a non-relational >> database, besides not being able to "select_related" the parent. > > What if you filter on one field defined in the parent class and > another field defined on the child class? Emulating this query would > be either very inefficient and (for large datasets) possibly return no > results, at all, or require denormalization which I'd find funny in > the case of MTI because it brings us back to single-table inheritance, > but it might be the only solution that works efficiently on all NoSQL > DBs. > Filters on base fields can be implemented fairly easily on databases with IN queries. Otherwise I suppose it raises an exception. > Bye, > Waldemar Kornewald > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To post to this group, send email to django-develop...@googlegroups.com. > To unsubscribe from this group, send email to > django-developers+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-developers?hl=en. > > Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Voltaire "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.