Re: DescriptorFields status/Manager API change

Adrian Holovaty Thu, 26 Jan 2006 08:29:26 -0800

On 1/25/06, Jacob Kaplan-Moss <[EMAIL PROTECTED]> wrote:
> > Also, it would be cool if the _set stuff wasn't forced on us, there's
> > a disconnect between creating an attribute called 'sites' in your
> > model, and accessing that attribute via 'site_set'.
>
> I *very* much agree with this.
>
> Consider::
>
>      class Article(models.Model):
>          headline = models.CharField(maxlength=50)
>          reporter = models.ForeignKey(Reporter)
>          sites = models.ManyToManyField(Site)
>
> The fact that I can't do ``article.sites.all()`` seems wrong -- is
> there a reason for the foo_set name over the given field name that
> I'm missing?  This also deals with the edge case where I have
> something like::


The reason is consistency. If we enforce the "_set" thing (or whatever
other name we come up with), that means there's a simple rule to
remember:

   * To access related objects, whether they're one-to-many or
many-to-many, just use the attribute called "modelname_set" on your
object, where modelname is the lowercase name of the related model.

If, instead, we allow the article.sites thing (instead of
article.site_set), that introduces a special-case for many-to-many
relationships.

Make sense? But, yeah, I realize that the "sites" attribute name in
that model is essentially meaningless. Any other ideas?

> Second, I'm still unhappy that it's not clear when a database lookup
> is performed.  Luckily, I think the proposed syntax makes it easy to
> write a few rules so that you'll always know when the db is being
> hit.  How does this sound:
>
>      1. ``Manager.get`` always perfoms a lookup.
>      2. Any other manager method only actually hits the database when
> iterated.
>      3. ``object.foreign_key`` always performs a lookup.
>      4. ``object.m2m_set`` is esentially a manager, so it behaves as
> for 1 and 2.

I was thinking this would behave exactly as before --

    1. Manager.get() always performs a lookup.
    2. Any other default manager method only actually hits the
database when iterated (or repr()'d?).
    3. object.foreign_key performs a lookup only the first time --
unless select_related=True was used on the lookup of the parent
object, in which case the value would already be cached.
    4. object.m2m_set is esentially a manager, so it behaves as for 1 and 2.

> There is a question of how/weather to cache database lookups::
>
>      article = Article.objects.get(pk=1)
>      article.reporter.fname  # obvious performs a lookup
>      article.reporter.lname  # does this?

No, that second one wouldn't perform a lookup. This would work exactly
as before, just like article.get_reporter().fname and
article.get_reporter().lname -- it's cached the first time it's
accessed.

> I'd expect that subsequent accesses to the same foreign key do not
> perform additional lookups, but if that's so there needs to be a way
> to clear the cached object in case you have an object you need to
> hang on to for a while.  I'd suggest ``article.clear_relation_cache()``.

We haven't had this up to this point, but -- sure. :)

> There's also the question of caching many to many lookups; my gut is
> that Django shouldn't::
>
>      list(article.site_set)  # hits the db
>      list(article.site_set)  # does it again

I believe Django currently *does* cache many-to-many lookups, so I'd
been thinking it would continue to do so -- but I don't really mind
either way.

> Finally, it seems like the manager has a lot of methods, and I think
> some of them are redundant.  For one, ``Reporter.objects.all
> ().distinct()`` strikes me as less obvious than ``Reporter.objects.all
> (distinct=True)``

Two reasons for making distinct a method instead of a keyword argument --

1. As Hugo pointed out, this makes it possible to chain queries.
2. It makes it possible to leave the "__exact" off, because we can be
positive any keyword argument to filter() is a field lookup, not a
meta argument such as "distinct" or "limit". For example, if you have
a field name called "distinct", you'd be able to do
Reporter.objects.filter(distinct='blah'), which would be the
equivalent of Reporter.objects.filter(distinct__exact='blah').
Granted, this is more of a side benefit than a reason *to* do it, but
I like it because it lets us trim the "__exact", which is the most
common lookup type.

> In fact, I have trouble understanding why we need ``all()`` at all,
> actually; what's the difference between these two?
>
>      Reporter.objects.all()
>      Reporter.objects.filter()
>
> Why not rename "filter" to "list" (or "iterator" if we want to be
> pedantic) and collapse the two methods into one?

It's for readability, like Hugo said, but I'm not 100% sold on it
either way. Come to think of it, get() accepts filter arguments, so,
if we have a list(), it'd be nice and consistent if list() also
accepted filter arguments.

> Similarly, the chaining of filter/order_by/in_bulk seems crufty to
> me; are these equivalent?
>
>      Reporter.objects.filter(fname="Joe").order_by("fname")
>      Reporter.objects.order_by("fname").filter(fname="Joe")
>
> I'm a big fan of the "One True Way" principle, so that rubs me the
> wrong way.  Why not simply have order_by/distinct/etc. be kwargs to
> the manager functions?  If there's a good reason to do the "chained
> methods" that's fine, but let's not be too clever for our own sakes, eh?

Again, it's all about chaining the methods to be able to pass around
the query object, and limiting the filter() arguments to remove the
"meta" arguments.

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Re: DescriptorFields status/Manager API change

Reply via email to