Rather than watch the "inherit from User" thread go round and round,
maybe I should give people something more concrete to think about.

This is a follow-up to the mail I sent late on Friday. It describes the
area where we need API additions or some kind of semi-major change to
incorporate model inheritance and some developer feedback would be a
good idea. I am only talking about the API here, not the implementation
(since you will notice there is no patch attached to the email).

Model inheritance (as I've implemented it) comes in two varieties,
differentiated by the way they store the data at the db level and by the
way you use them at the Python level.

-----------------------
1. Abstract Base class
-----------------------
One use-case for subclassing is to use the base class as a place to
store common fields and functionality. It is purely a "factor out the
common stuff" holder and you don't ever intend to use the base class on
its own. In the terminology of other languages (e.g. C++, Java), the
base class it is an abstract class.

At the database level, it makes sense to store the fields from the base
class in the same table as the fields from the derived class. You will
not be instantiating the base class (or querying it) on its own, so we
can view the subclassed model as a flattened version of the parent +
child. For example,

        class Thing(models.Model):
           name = models.CharField(...)
        
        class Animal(Thing):
           genus = models.CharField(...)
        
        class Toy(Thing):
           manufacturer = models.CharField(...)
        
would generate two database tables (Thing does not get its own table).
The Animal table would have "name" and "genus" columns (along with the
standard extras like "id") and Toy would have "name" and "manufacturer"
columns.

We need a way to tell Django that "Thing" is an abstract class. I would
propose

        class Thing(models.Model):
           name = ...
        
           class Meta:
              is_abstract = True
        
Points To Ponder
================
(1) What notation to use to declare a class as abstract? I've thrown out
Meta.is_abstract -- any preference for alternatives?

(2) Any strong reason not to include this case? It's only for "advanced
use", since there are a few ways to shoot yourself in the foot (e.g.
declare a class abstract, create the tables, remove the abstract
declaration, watch code explode). However, it will be useful in some
cases. [I have some scripts to help with converting non-abstract
inheritance to abstract and vice-versa at the database level, too.]

---------------------------
2. "Pythonic" Inheritance
----------------------------
The traditional Python inheritance model allows one to create instances
of the base class and work with instances of base classes as though they
were the parent. Extending this to Django's querying model, we should be
able to run queries against the Parent class:

        Thing.objects.filter(name = "horse")
        
in the above example and, through the magic of duck typing, have the
right sort of object returned.

Amazingly enough, this is also supported in my code. It follows natural
the multi-table model of doing inheritance (in the above classes, Animal
would have a foreign key reference to the Thing table, similarly for
Toy). Multiple inheritance works as well.

In order to make duck typing work without doing tons of extra database
queries, it is necessary to have a mapping between each row and the type
of object it ultimately represents. So a row in the Thing table for an
Animal object would say "I am an Animal" (we already know it's a Thing,
because it's in the Thing table). The reverse direction (a row in the
Animal table is also a Thing) is easy because we have the model
description at the Python level and the foreign key constraint at the
database level (the latter being more of a theoretical construct if
you're using SQLite, but that's not a showstopper). When thinking about
this, realise that it works smoothly for multi-layer inheritance: it
only takes two queries to get all the data for any object, even if you
start by querying the top table in the hierarchy.

I have currently implemented this "downwards" reference as an extra
column in each table that is a foreign key to the ContentType table (to
head off one question: a GenericForeignKey field adds nothing here).
Every single table would need this column because you never know when
somebody is going to subclass that model and every query needs to
retrieve that value as well.

The alternative is to have a separate table that has columns for
content_type, pk_value, derived_content_type that performs the same
function. The drawback of this is that every single query needs to do a
join with this second table, because we need to know the derived class
in order to create the right type of object back in Python land (a query
on the Thing table for something that is ultimately an Animal should
return an Animal instance; that is how Python works and we don't want to
go and introduce C++-style casts).

Points To Ponder
================
(3) I think having the downward reference column (the one that specifies
the type of the most-derived object) as a column on each table is the
right approach. Anybody have strong arguments for the other approach (a
separate table)?

        - we can ship a script that helps with conversion from existing
        tables to the new structure. There is no strict requirement on
        what this new column is called -- it can be configured on a
        per-model basis so that we don't restrict anybody's particular
        column choices.

-----------------------
3. What you don't get
-----------------------
I am avoiding PostgreSQL's table inheritance feature. It is not a
standard feature on databases, so we have to do inheritance inside
Django anyway and having to maintain both the Django version and the
PostgreSQL-specific version will lead to errors (we already have enough
per-database specific stuff to discourage anybody from wanting more at a
whim).

I am not implementing the "everything in one table" storage model. It is
not universally applicable (you can't inherit from third-party models,
for a start) and I have software engineering objections to it. It also
doesn't add much value at this point in time. If somebody wants that
later, that is for later.

I am not doing anything view-based. Most database compute views on
demand, so they don't add any real performance value and hide some
complexity (the complexity is already hidden from the Python user
anyway. But hiding it from the developer trying to fix core bugs is not
a good plan).

All of the above features can be worked on by others if they like (free
and open source and all that).

If we can get the query SQL generation stuff sorted out next week when a
few of us are in one place, fixing the model inheritance patch to work
with that is all that remains to do before I can submit it. Currently I
do get bitten by some query generation bugs, so it's not an either/or
situation.

Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to