I took a quick look at your patch. I don't have more time now, so just some 
quick comments:
  - In general, the approach where aggregates are just expressions sounds and 
looks
    valid.
  - I would not worry about the extra time used in djangobench. However, 
profiling why
    there is extra time used is always recommended.
  - I am a bit scared of the type coercions. The reason is that this could 
prove to be
    hopelessly complex to get right in every case. However, I do not have 
concrete
    examples where this is in fact a problem. The default should probably not 
be an exception,
    but just returning what the database happens to give you back.

I think the approach you have taken is correct in general. I would encourage to 
check
if you can somewhat easily incorporate the conditional aggregate support 
(#11305)
into the ExpressionNode based aggreagates. It does not belong into the same
patch, but is a good sanity check if the approach taken is extensible.


[Following is a bit off-topic]
I wonder if the ExpressionNode itself should be refactored into a public API. 
This way
you could easily write your own SQL snippets injectable into the query. This 
could be
custom aggregates, or this could be just NULLS LAST order by clauses.

The reason I bring this up is that in the long run, adding more and more 
special case
support to the ORM (conditional aggregates, different SQL functions) doesn't 
seem to be
the right way forward. Once you get expression composition in, you only have 
90% of
SQL constructs left...

Spend the time in building support for user writable SQL snippets, so that they 
can
use just the SQL they want. In my opinion NULLS LAST/FIRST support is a great
example: it is common enough that users need it from time to time, but it is 
not common
enough to spend the time to support this special case. Why not just:
qs.order_by(SQL('%s NULLS LAST', F('pub_date'))
and you now got support for _any_ order by clause the user wishes to use. 
Replaces
extra(), but in a cleaner way. The above could support relabel_aliases(). Or 
you could
write it just as qs.order_by(SQL('pub_date NULLS LAST')) if you don't care for 
relabel
aliases support.

For the F-expression support in aggregates this would mean you get actually not 
just
F expression support in aggregates, but any SQL snippet can be injected into the
aggregates, for example Sum(SQL('case when person.age > friend.age then 1 else 
0 end'))

 - Anssi
________________________________________
From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Nate Bragg [jonathan.br...@alum.rpi.edu]
Sent: Wednesday, March 21, 2012 01:27
To: django-developers@googlegroups.com
Subject: Complex aggregate and expression composition.

Hello all,

Since even before I saw Alex Gaynor's presentation "I hate the Django ORM"
(the one with the `Sum(F("end_time") - F("start_time"))` query), the problem
of complex aggregates and expressions has vexed me. So, I figured I would
try to solve it.

Originally, I started this trying to pursue a solution to ticket #14030, but 
after
I took a couple of lousy shots at it, it dawned on me that the ticket would be
better resolved as a result of solving the more general case. I realized that
aggregates were just a special case of expressions, and that the best solution
was going to take a refactoring of Aggregate into ExpressionNode.

I have uploaded my branch; it can be found here:

https://github.com/NateBragg/django/tree/14030

Since this is a non-trivial change, I was hoping to open the topic for debate
here, and get some feedback before proposing my solution for inclusion.

Some particular points of note:
* I tried to preserve as much interface as possible; I didn't know how much
  was considered to be more public, so generally I tried to add rather than
  subtract. However, I did remove a couple things - if you see something
  missing that shouldn't be, let me know.
* Currently, I'm getting the entire test suite passed on sqlite, oracle, mysql,
  postgres, and postgis. I was unable to test on oracle spatial - any help
  with that would be appreciated.
* When fields are combined, they are coerced to a common type;
  IntegerFields are coerced to FloatFields, which are coerced into
  DecimalFields as needed. Any other kinds of combinations must be of
  the same types. Also, when coerced to a DecimalField, the precision
  is limited by the original DecimalField. If this is not correct, or other
  coercions should be added, I'd like to correct that.
* When joins are required, they tend to be LEFT OUTER; I'd like some
  feedback on this, as I'm not 100% sure its always the best behavior.
* As the aggregates are a little more complicated, on trivial cases there
  is a minor reduction in performance; using djangobench, I measured
  somewhere between a 3% and 8% increase in runtime.
* I don't have enough tests - mostly for a lack of creativity. What kind of
  composed aggregates and expressions would you like to see? I'll add
  tests for them.

Thank you all, and sorry for the above being a short book.

-- Nate Bragg

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to