I took a quick look at your patch. I don't have more time now, so just some quick comments: - In general, the approach where aggregates are just expressions sounds and looks valid. - I would not worry about the extra time used in djangobench. However, profiling why there is extra time used is always recommended. - I am a bit scared of the type coercions. The reason is that this could prove to be hopelessly complex to get right in every case. However, I do not have concrete examples where this is in fact a problem. The default should probably not be an exception, but just returning what the database happens to give you back.
I think the approach you have taken is correct in general. I would encourage to check if you can somewhat easily incorporate the conditional aggregate support (#11305) into the ExpressionNode based aggreagates. It does not belong into the same patch, but is a good sanity check if the approach taken is extensible. [Following is a bit off-topic] I wonder if the ExpressionNode itself should be refactored into a public API. This way you could easily write your own SQL snippets injectable into the query. This could be custom aggregates, or this could be just NULLS LAST order by clauses. The reason I bring this up is that in the long run, adding more and more special case support to the ORM (conditional aggregates, different SQL functions) doesn't seem to be the right way forward. Once you get expression composition in, you only have 90% of SQL constructs left... Spend the time in building support for user writable SQL snippets, so that they can use just the SQL they want. In my opinion NULLS LAST/FIRST support is a great example: it is common enough that users need it from time to time, but it is not common enough to spend the time to support this special case. Why not just: qs.order_by(SQL('%s NULLS LAST', F('pub_date')) and you now got support for _any_ order by clause the user wishes to use. Replaces extra(), but in a cleaner way. The above could support relabel_aliases(). Or you could write it just as qs.order_by(SQL('pub_date NULLS LAST')) if you don't care for relabel aliases support. For the F-expression support in aggregates this would mean you get actually not just F expression support in aggregates, but any SQL snippet can be injected into the aggregates, for example Sum(SQL('case when person.age > friend.age then 1 else 0 end')) - Anssi ________________________________________ From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Nate Bragg [jonathan.br...@alum.rpi.edu] Sent: Wednesday, March 21, 2012 01:27 To: django-developers@googlegroups.com Subject: Complex aggregate and expression composition. Hello all, Since even before I saw Alex Gaynor's presentation "I hate the Django ORM" (the one with the `Sum(F("end_time") - F("start_time"))` query), the problem of complex aggregates and expressions has vexed me. So, I figured I would try to solve it. Originally, I started this trying to pursue a solution to ticket #14030, but after I took a couple of lousy shots at it, it dawned on me that the ticket would be better resolved as a result of solving the more general case. I realized that aggregates were just a special case of expressions, and that the best solution was going to take a refactoring of Aggregate into ExpressionNode. I have uploaded my branch; it can be found here: https://github.com/NateBragg/django/tree/14030 Since this is a non-trivial change, I was hoping to open the topic for debate here, and get some feedback before proposing my solution for inclusion. Some particular points of note: * I tried to preserve as much interface as possible; I didn't know how much was considered to be more public, so generally I tried to add rather than subtract. However, I did remove a couple things - if you see something missing that shouldn't be, let me know. * Currently, I'm getting the entire test suite passed on sqlite, oracle, mysql, postgres, and postgis. I was unable to test on oracle spatial - any help with that would be appreciated. * When fields are combined, they are coerced to a common type; IntegerFields are coerced to FloatFields, which are coerced into DecimalFields as needed. Any other kinds of combinations must be of the same types. Also, when coerced to a DecimalField, the precision is limited by the original DecimalField. If this is not correct, or other coercions should be added, I'd like to correct that. * When joins are required, they tend to be LEFT OUTER; I'd like some feedback on this, as I'm not 100% sure its always the best behavior. * As the aggregates are a little more complicated, on trivial cases there is a minor reduction in performance; using djangobench, I measured somewhere between a 3% and 8% increase in runtime. * I don't have enough tests - mostly for a lack of creativity. What kind of composed aggregates and expressions would you like to see? I'll add tests for them. Thank you all, and sorry for the above being a short book. -- Nate Bragg -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.