Re: Aggregate functions on collections, collection functions and MAXWRITETIME

Benedict Fri, 09 Dec 2022 06:05:31 -0800

Why would that require a fancy query optimiser? It doesn’t seem like it would 
be much more difficult than modifying our function binding rules and having a 
specialised version.




> On 9 Dec 2022, at 13:26, Andrés de la Peña <adelap...@apache.org> wrote:
> 
> 
> Note that specialized collection functions are also an opportunity for 
> optimization. For example, COLLECTION_COUNT reads only the first bytes of a 
> serialized collection, since those bytes contain the number of elements in 
> that collection. The most simple implementation of COUNT(UNNEST(collection)) 
> wouldn't do that. It would probably require to deserialize the entire 
> collection. A fancy query optimizer could internally translate 
> COUNT(UNNEST(collection)) to COLLECTION_COUNT(collection) to get a nice 
> performance improvement. Unfortunately we don't have such optimizer at the 
> moment.
> 
> I don't see a reason not to leave those collection functions as they are, 
> renaming aside. They can be later complemented by the generic UNNEST, or 
> subqueries, when someone is willing to work on those features.
> 
> It's not clear to me how UNNEST + UDA would operate on maps. We would still 
> need a way to extract the keys or the values of maps, like the current 
> MAP_KEYS and MAP_VALUES functions, wouldn't we?
> 
> As for the impossibility of applying COLLECTION_MAX, COLLECTION_MIN, etc. to 
> maps, I wouldn't be against renaming those to SET_MAX, LIST_MAX, SET_MIN, 
> SET_MAX, etc. Sets and lists have many things in common and it's a pity that 
> we don't have a common name for them. This lack of a common name for lists 
> and sets is something that permeates into the code at multiple places. This 
> kind of problems are probably the reason why Java's maps aren't collections.
> 
> 
> 
> 
> 
>> On Fri, 9 Dec 2022 at 11:26, Benedict <bened...@apache.org> wrote:
>> Right, this is basically my view - it can be syntactic sugar for UNNEST 
>> subqueries as and when we offer those (if not now), but I think we should be 
>> able to apply any UDA or aggregate to collections with some syntax that’s 
>> ergonomic.
>> 
>> I don’t think APPLY is the right way to express it, my version was 
>> MAX(column AS COLLECTION) which means bind this operator to the collection 
>> rather than the rows (and since this is CAST-like, I’d say this is also a 
>> reasonable way to apply aggregations to single values too)
>> 
>> But perhaps there’s some other third option. Or, if not, let’s simply 
>> support UNNEST subqueries.
>> 
>> 
>>>> On 9 Dec 2022, at 11:19, Claude Warren, Jr via dev 
>>>> <dev@cassandra.apache.org> wrote:
>>>> 
>>> 
>>> I still think that semantically it makes sense to have a function that 
>>> applies an aggregate to various collection types.  So rather than building 
>>> ARRAY_MAX do APPLY(MAX, column)) or APPLY(MAX(column)) it is clear what is 
>>> being requested and APPLY can be the source of truth for which aggregate 
>>> functions work on which column types.
>>> 
>>>> On Fri, Dec 9, 2022 at 10:28 AM Andrés de la Peña <adelap...@apache.org> 
>>>> wrote:
>>>> Indeed this discussion is useful now that we know that there is dissension 
>>>> about these changes. However, as those changes were happening none of the 
>>>> persons involved on them felt the need of a discuss thread, and I opened 
>>>> this thread as soon as Benedict objected the changes. I think this is 
>>>> perfectly in line with ASF's wise policy about lazy consensus: 
>>>> https://community.apache.org/committers/lazyConsensus.html
>>>> 
>>>>> My point here was that none I could find use ARRAY_COUNT - either 
>>>>> ARRAY_SIZE or ARRAY_LENGTH
>>>> 
>>>> A quick search on Google shows:
>>>> https://www.ibm.com/docs/en/psfa/7.2.1?topic=functions-array-count
>>>> https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/arrayfun.html#fn-array-count
>>>> https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Array/ARRAY_COUNT.htm
>>>> 
>>>>> We seem to be mixing and matching our databases we use as templates here. 
>>>>> Most of the standard examples we use (Postgres, MySQL etc) do not offer 
>>>>> this, and do not offer equivalent functions on any of their other 
>>>>> collection types.
>>>> 
>>>> I don't know what's wrong with those databases, nor what makes MySQL and 
>>>> Postgres more standard than others. AFAIK MySQL doesn't even have an array 
>>>> type, so it hardly is going to have array functions. Postgres however does 
>>>> have arrays. Those arrays can manipulated with both subqueries and an 
>>>> unnest function. However, this doesn't impede Postgres to also have a set 
>>>> of array functions: https://www.postgresql.org/docs/12/functions-array.html
>>>> 
>>>> It seems that it is difficult to find a DB supporting arrays that doesn't 
>>>> also offer an assorted set of array functions. DBs can perfectly support 
>>>> subqueries, unnesting and utility functions at the same time. For example, 
>>>> with Postgres you can get the size of an array with a subquery, or with 
>>>> UNNEST, or with the ARRAY_LENGTH function.
>>>> 
>>>> The collection functions that we discuss here are mostly analogous to 
>>>> those sets of functions that we find anywhere. They are a quite small, 
>>>> well encapsulated and not invasive feature that has the advantage of being 
>>>> already done. Those functions don't seem to impede us to add support for 
>>>> subqueries or unnesting whenever someone wants to work on them.
>>>> 
>>>> Adding subqueries on a later stage wouldn't involve the deprecation of the 
>>>> collection functions, since those are still useful as a shortcut, as we 
>>>> see in other DBs out there.
>>>> 
>>>> 
>>>> 
>>>>> On Thu, 8 Dec 2022 at 20:47, J. D. Jordan <jeremiah.jor...@gmail.com> 
>>>>> wrote:
>>>>> I think this thread proves the point that a DISCUSS thread for API 
>>>>> changes on dev@ will get more viewpoints than just having something in 
>>>>> JIRA. I think this thread has been useful and should result in us having 
>>>>> a better user facing API than without it.
>>>>> 
>>>>>>> On Dec 8, 2022, at 1:57 PM, Andrés de la Peña <adelap...@apache.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>> 
>>>>>>> I expect we’ll rehash it every API thread otherwise.
>>>>>> 
>>>>>> Since you bring up the topic, I understand that opposing to every single 
>>>>>> reviewed decision that has been taken on CASSANDRA-17811, 
>>>>>> CASSANDRA-8877, CASSANDRA-17425 and CASSANDRA-18085 could make an 
>>>>>> argument in favour of the policy demanding a DISCUSS thread for every 
>>>>>> new feature, big or small. One could argue that any change to the work 
>>>>>> on this area that comes out from the current discussion could have been 
>>>>>> prevented by a previous DISCUSS thread. 
>>>>>> 
>>>>>> However, those tickets have been around for long in a very public way, 
>>>>>> and there hasn't been any controversy around them until now. So I think 
>>>>>> that an initial thread wouldn't have prevented that anyone can resurrect 
>>>>>> the discussion at any point in time if they haven't put the time to look 
>>>>>> into the changes before. We have already seen those after-commit 
>>>>>> discussions even for discussed, voted, approved and reviewed CEPs. The 
>>>>>> only thing we would get is two discussion threads instead of one. By the 
>>>>>> looks of it, I doubt that the suggested policy about discuss threads is 
>>>>>> going to be accepted. 
>>>>>> 
>>>>>> In any case, this is a separate topic from what we're discussing here. 
>>>>>> 
>>>>>> 
>>>>>>> On Thu, 8 Dec 2022 at 18:21, Benedict <bened...@apache.org> wrote:
>>>>>>> It feels like this is a recurring kind of discussion, and I wonder if 
>>>>>>> there’s any value in deciding on a general approach to guide these 
>>>>>>> discussions in future? Are we aiming to look like SQL as we evolve, and 
>>>>>>> if so which products do we want to be informed by?
>>>>>>> 
>>>>>>> I expect we’ll rehash it every API thread otherwise.
>>>>>>> 
>>>>>>>>> On 8 Dec 2022, at 17:37, Benedict <bened...@apache.org> wrote:
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> 1) Do they offer ARRAY_SUM or ARRAY_AVG?
>>>>>>>>> Yes, a quick search on Google shows some examples:
>>>>>>>>> https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/68fdFR3LWhx7KtHc9Iv5Qg
>>>>>>>>> https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/gxz1nB7GclxNO5mBd~rn8g
>>>>>>>>> https://docs.upsolver.com/sqlake/functions-and-operators-reference/array-functions/array_sum
>>>>>>>>> https://docs.firebolt.io/sql-reference/functions-reference/array-sum.html
>>>>>>>> 
>>>>>>>> We seem to be mixing and matching our databases we use as templates 
>>>>>>>> here. Most of the standard examples we use (Postgres, MySQL etc) do 
>>>>>>>> not offer this, and do not offer equivalent functions on any of their 
>>>>>>>> other collection types.
>>>>>>>> 
>>>>>>>> If we are to use industry norms as inspiration or justification, 
>>>>>>>> subqueries do seem to be the approach taken. So the question is really 
>>>>>>>> why we should introduce non-standard variants of fairly atypical 
>>>>>>>> functions instead of the more typical approach of subqueries? Because 
>>>>>>>> it’s more work, or some other reason? We should be explicit.
>>>>>>>> 
>>>>>>>> If we aren’t using industry norms as our justification, we should 
>>>>>>>> probably jettison the lot as honestly the landscape is a mess in this 
>>>>>>>> area.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 2) Do they define ARRAY_COUNT or ARRAY_LENGTH?
>>>>>>>>> Yes, again we can search for some examples:
>>>>>>>>> https://docs.snowflake.com/en/sql-reference/functions/array_size.html
>>>>>>>>> https://docs.databricks.com/sql/language-manual/functions/array_size.html
>>>>>>>> 
>>>>>>>> My point here was that none I could find use ARRAY_COUNT - either 
>>>>>>>> ARRAY_SIZE or ARRAY_LENGTH
>>>>>>>> 
>>>>>>>>>> 3) A map is a collection in C* parlance, but I gather from below you 
>>>>>>>>>> expect these methods not to operate on them?
>>>>>>>>> Nope, only COLLECTION_COUNT works on sets, lists and maps. 
>>>>>>>>> COLLECTION_MIN and COLLECTION_MAX require a set or list. 
>>>>>>>>> COLLECTION_SUM and COLLECTION_AVG require a numeric collection, the 
>>>>>>>>> same way that the ARRAY_SUM and ARRAY_AVG functions above require a 
>>>>>>>>> numeric array. 
>>>>>>>> 
>>>>>>>> Should collection_max only work on list and set when we can call 
>>>>>>>> MAX(map) today (at least I think we can)? Whatever we decide our 
>>>>>>>> aggregate behaviour is should be consistent here.
>>>>>>>> 
>>>>>>>>>> Does ARRAY_MAX operate on single values? If we are to base our 
>>>>>>>>>> decisions on norms elsewhere, we should be consistent about it.
>>>>>>>>> It doesn't in any the examples above. Those functions aren't a 
>>>>>>>>> standard so I don't know if there are others dbs around that support 
>>>>>>>>> it. In any case, the fact that we look for inspiration in other 
>>>>>>>>> databases to minimize surprise etc. doesn't mean that we have to do 
>>>>>>>>> exactly the same. After all, CQL is not SQL and our collections 
>>>>>>>>> aren't SQL arrays.
>>>>>>>>> 
>>>>>>>>> Note that the collection functions added by CASSANDRA-8877 don't 
>>>>>>>>> operate on single values either. That idea was proposed by Yifan on 
>>>>>>>>> CASSANDRA-18078 and it looked good to Francisco and me. The patch is 
>>>>>>>>> on CASSANDRA-18085, already reviewed and blocked waiting on the 
>>>>>>>>> outcome of this discussion.
>>>>>>>>> 
>>>>>>>>> The new collection functions can do the same as the new MAXWRITE 
>>>>>>>>> function, but not only for getting max timestamps, but also min 
>>>>>>>>> timestamps and min/max ttls. The missing part is that MAXWRITE can 
>>>>>>>>> accept both collections and single elements, so callers don't need to 
>>>>>>>>> know the type of the column. That's the motivation behind the idea of 
>>>>>>>>> doing the same with the collection functions, so they can entirely 
>>>>>>>>> replace MAXWRITE.
>>>>>>>>> 
>>>>>>>>> However I wouldn't be against leaving the collection functions 
>>>>>>>>> working only on collections, as originally designed, and as they 
>>>>>>>>> currently are on trunk. The question is what we do with MAXWRITETIME. 
>>>>>>>>> That function is also only on trunk, and it might be repetitive given 
>>>>>>>>> the more generic collection functions. It's also a bit odd that there 
>>>>>>>>> isn't, for example, a similar MINTTL function. Maybe we should start 
>>>>>>>>> a separate discussion thread about that new function?
>>>>>>>> 
>>>>>>>> I think we should figure out our overall strategy - these are all 
>>>>>>>> pieces of the puzzle IMO. But I guess the above questions seem to come 
>>>>>>>> first and will shape this. I would be in favour of some general 
>>>>>>>> approach, however, such as either first casting to a collection, or 
>>>>>>>> passing an aggregation operator to WRITETIME.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> On 8 Dec 2022, at 17:13, Andrés de la Peña <adelap...@apache.org> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 1) Do they offer ARRAY_SUM or ARRAY_AVG?
>>>>>>>>> Yes, a quick search on Google shows some examples:
>>>>>>>>> https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/68fdFR3LWhx7KtHc9Iv5Qg
>>>>>>>>> https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/gxz1nB7GclxNO5mBd~rn8g
>>>>>>>>> https://docs.upsolver.com/sqlake/functions-and-operators-reference/array-functions/array_sum
>>>>>>>>> https://docs.firebolt.io/sql-reference/functions-reference/array-sum.html
>>>>>>>>> 
>>>>>>>>>> 2) Do they define ARRAY_COUNT or ARRAY_LENGTH?
>>>>>>>>> Yes, again we can search for some examples:
>>>>>>>>> https://docs.snowflake.com/en/sql-reference/functions/array_size.html
>>>>>>>>> https://docs.databricks.com/sql/language-manual/functions/array_size.html
>>>>>>>>> 
>>>>>>>>>> 3) A map is a collection in C* parlance, but I gather from below you 
>>>>>>>>>> expect these methods not to operate on them?
>>>>>>>>> Nope, only COLLECTION_COUNT works on sets, lists and maps. 
>>>>>>>>> COLLECTION_MIN and COLLECTION_MAX require a set or list. 
>>>>>>>>> COLLECTION_SUM and COLLECTION_AVG require a numeric collection, the 
>>>>>>>>> same way that the ARRAY_SUM and ARRAY_AVG functions above require a 
>>>>>>>>> numeric array. 
>>>>>>>>> 
>>>>>>>>>> Does ARRAY_MAX operate on single values? If we are to base our 
>>>>>>>>>> decisions on norms elsewhere, we should be consistent about it.
>>>>>>>>> It doesn't in any the examples above. Those functions aren't a 
>>>>>>>>> standard so I don't know if there are others dbs around that support 
>>>>>>>>> it. In any case, the fact that we look for inspiration in other 
>>>>>>>>> databases to minimize surprise etc. doesn't mean that we have to do 
>>>>>>>>> exactly the same. After all, CQL is not SQL and our collections 
>>>>>>>>> aren't SQL arrays.
>>>>>>>>> 
>>>>>>>>> Note that the collection functions added by CASSANDRA-8877 don't 
>>>>>>>>> operate on single values either. That idea was proposed by Yifan on 
>>>>>>>>> CASSANDRA-18078 and it looked good to Francisco and me. The patch is 
>>>>>>>>> on CASSANDRA-18085, already reviewed and blocked waiting on the 
>>>>>>>>> outcome of this discussion.
>>>>>>>>> 
>>>>>>>>> The new collection functions can do the same as the new MAXWRITE 
>>>>>>>>> function, but not only for getting max timestamps, but also min 
>>>>>>>>> timestamps and min/max ttls. The missing part is that MAXWRITE can 
>>>>>>>>> accept both collections and single elements, so callers don't need to 
>>>>>>>>> know the type of the column. That's the motivation behind the idea of 
>>>>>>>>> doing the same with the collection functions, so they can entirely 
>>>>>>>>> replace MAXWRITE.
>>>>>>>>> 
>>>>>>>>> However I wouldn't be against leaving the collection functions 
>>>>>>>>> working only on collections, as originally designed, and as they 
>>>>>>>>> currently are on trunk. The question is what we do with MAXWRITETIME. 
>>>>>>>>> That function is also only on trunk, and it might be repetitive given 
>>>>>>>>> the more generic collection functions. It's also a bit odd that there 
>>>>>>>>> isn't, for example, a similar MINTTL function. Maybe we should start 
>>>>>>>>> a separate discussion thread about that new function?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Thu, 8 Dec 2022 at 14:21, Benedict <bened...@apache.org> wrote:
>>>>>>>>>> 1) Do they offer ARRAY_SUM or ARRAY_AVG?
>>>>>>>>>> 2) Do they define ARRAY_COUNT or ARRAY_LENGTH?
>>>>>>>>>> 3) A map is a collection in C* parlance, but I gather from below you 
>>>>>>>>>> expect these methods not to operate on them?
>>>>>>>>>> 
>>>>>>>>>> Does ARRAY_MAX operate on single values? If we are to base our 
>>>>>>>>>> decisions on norms elsewhere, we should be consistent about it.
>>>>>>>>>> 
>>>>>>>>>> It’s worth noting that ARRAY is an ISO SQL concept, as is MULTISET. 
>>>>>>>>>> Some databases also have Set or Map types, such as MySQL’s Set and 
>>>>>>>>>> Postgres’ hstore. These databases only support ARRAY_ functions, 
>>>>>>>>>> seemingly, plus special MULTISET operators defined by the SQL 
>>>>>>>>>> standard where that data type is supported.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> On 8 Dec 2022, at 12:11, Andrés de la Peña <adelap...@apache.org> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> "ARRAY_MAX" and "ARRAY_MIN" functions to get the max/min element in 
>>>>>>>>>>> a list are not an uncommon practice. You can find them in SparkSQL, 
>>>>>>>>>>> Amazon Timestream, Teradata, etc. Since we have what we call 
>>>>>>>>>>> collections instead or arrays, it makes sense to call the analogous 
>>>>>>>>>>> functions "COLLECTION_MAX", "COLLECTION_MIN", etc.
>>>>>>>>>>> 
>>>>>>>>>>> As for maps, CASSANDRA-8877 also introduced "MAP_KEYS" and 
>>>>>>>>>>> "MAP_VALUES" functions to get the keys or the values of a map, so 
>>>>>>>>>>> one can feed them to "MAX", "COLLECTION_MAX", etc. That isn't 
>>>>>>>>>>> anything too original either, you can find identical functions on 
>>>>>>>>>>> SparkSQL for example.
>>>>>>>>>>> 
>>>>>>>>>>> I find simple utility functions easier to use than subqueries. But 
>>>>>>>>>>> we don't have to chose. We can also have subqueries if someone 
>>>>>>>>>>> finds the time to work on them.
>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, 8 Dec 2022 at 12:04, Claude Warren, Jr via dev 
>>>>>>>>>>>> <dev@cassandra.apache.org> wrote:
>>>>>>>>>>>>  I think the semantics of the situation is important here.
>>>>>>>>>>>> 
>>>>>>>>>>>> Let’s take MAX as our example aggregate function..
>>>>>>>>>>>> 
>>>>>>>>>>>> We all expect that in a DB context MAX(column) will return the 
>>>>>>>>>>>> value of the column with the maximum value. That is the expected 
>>>>>>>>>>>> semantics of MAX.
>>>>>>>>>>>> 
>>>>>>>>>>>> The question here is that there are data types that are 
>>>>>>>>>>>> multi-valued and there is a desire to apply MAX to the values 
>>>>>>>>>>>> within the column. I would expect that this would return the 
>>>>>>>>>>>> maximum value of the column every row in the DB.
>>>>>>>>>>>> 
>>>>>>>>>>>> So if there were a keyword that operated like the Java BiFunction 
>>>>>>>>>>>> class where the Function would apply a second function to the 
>>>>>>>>>>>> column data. For purposes of this discussion let’s call this 
>>>>>>>>>>>> Function APPLY.
>>>>>>>>>>>> 
>>>>>>>>>>>> So APPLY( MAX, column ) would return the maximum value from the 
>>>>>>>>>>>> column for each row in the DB.
>>>>>>>>>>>> 
>>>>>>>>>>>> MAX(APPLY(MAX,column)) would get the maximum value from the column 
>>>>>>>>>>>> across all the rows.
>>>>>>>>>>>> 
>>>>>>>>>>>> Similarly APPLY could be used with other functions 
>>>>>>>>>>>> MAX(APPLY(MIN,column)) the largest minimum value from the column 
>>>>>>>>>>>> across all rows.
>>>>>>>>>>>> 
>>>>>>>>>>>> These statements make clear semantically what is being asked for.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Dec 8, 2022 at 10:57 AM Benedict <bened...@apache.org> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> I meant unnest, not unwrap.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 8 Dec 2022, at 10:34, Benedict <bened...@apache.org> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I do not think we should have functions that aggregate across 
>>>>>>>>>>>>>>> rows and functions that operate within a row use the same name.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I’m sympathetic to that view for sure. I wouldn’t be too 
>>>>>>>>>>>>>> disappointed by that outcome, and SQL engines seem to take a 
>>>>>>>>>>>>>> similar approach, however they mostly rely on sub-queries to get 
>>>>>>>>>>>>>> around this problem, and the SQL standard introduces UNWRAP for 
>>>>>>>>>>>>>> operating on arrays (by translating them into a table), 
>>>>>>>>>>>>>> permitting subqueries to aggregate them. It seems to me we have 
>>>>>>>>>>>>>> four options:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1) introduce functionality similar to UNWRAP and subqueries 
>>>>>>>>>>>>>> 2) introduce new syntax to permit operating on collections with 
>>>>>>>>>>>>>> the same functions
>>>>>>>>>>>>>> 3) permit the same functions to operate on both, with a 
>>>>>>>>>>>>>> precedence order, and introduce syntax to permit breaking the 
>>>>>>>>>>>>>> precedence order
>>>>>>>>>>>>>> 4) introduce new functions
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (1) might look like SELECT (SELECT MAX(item) FROM UNWRAP(list)) 
>>>>>>>>>>>>>> AS max_item FROM table
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (2) and (3) might look something like:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT MAX(list AS COLLECTION) or 
>>>>>>>>>>>>>> SELECT MAX(list AS ROWS)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (4) might look something like we have already, but perhaps with 
>>>>>>>>>>>>>> different names
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The comparator for collections is the lexicographical compare 
>>>>>>>>>>>>>>> on the collection items
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is a fair point, I mistakenly thought it sorted first on 
>>>>>>>>>>>>>> size. Even this definition is a little funkier for Map types, 
>>>>>>>>>>>>>> where the values of a key may cause something to sort earlier 
>>>>>>>>>>>>>> than a map whose next key sorts first. There are multiple 
>>>>>>>>>>>>>> potential lexicographical sorts for Maps (i.e., by keys first, 
>>>>>>>>>>>>>> then values, or by (key, value) pairs), so this is particularly 
>>>>>>>>>>>>>> poorly defined IMO.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The maximum of a blob type is pretty well defined I think, as 
>>>>>>>>>>>>>> are boolean, inetaddress etc. However, even for List or Set 
>>>>>>>>>>>>>> collections there’s multiple reasonable functions one could 
>>>>>>>>>>>>>> define for maximum, so it would make more sense to me to permit 
>>>>>>>>>>>>>> the user to define the comparison as part of the MAX function if 
>>>>>>>>>>>>>> we are to offer it. However, with the lexicographical definition 
>>>>>>>>>>>>>> we have I am somewhat less concerned for Set and List. Map seems 
>>>>>>>>>>>>>> like a real problem though, if we support these operators (which 
>>>>>>>>>>>>>> perhaps we do not).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 7 Dec 2022, at 12:13, Andrés de la Peña 
>>>>>>>>>>>>>>>> <adelap...@apache.org> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The comparator for collections is the lexicographical compare 
>>>>>>>>>>>>>>> on the collection items. That might nor be the more useful 
>>>>>>>>>>>>>>> thing but it's not impossible to imagine cases where that 
>>>>>>>>>>>>>>> ordering can be useful. To make a random example, you can use a 
>>>>>>>>>>>>>>> list column to store the name and surnames of a person, 
>>>>>>>>>>>>>>> considering that some persons can have multiple surnames. You 
>>>>>>>>>>>>>>> can then sort rows based on that list column, to get the names 
>>>>>>>>>>>>>>> in alphabetical order, or to get the first or last person 
>>>>>>>>>>>>>>> according to that order. I'm sure we can think on more cases 
>>>>>>>>>>>>>>> were the lexicographical order of a list can be useful, 
>>>>>>>>>>>>>>> although I agree that's is not the greatest feature ever.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> It's worth mentioning that collections are not the only data 
>>>>>>>>>>>>>>> types where the MIN/MAX functions are of dubious utility. For 
>>>>>>>>>>>>>>> example, blob columns can also be used with MIN/MAX. Same as 
>>>>>>>>>>>>>>> with collections, the min/max blobs are selected according to 
>>>>>>>>>>>>>>> the comparator for the data type. That comparator is the 
>>>>>>>>>>>>>>> lexicographic compare on the unsigned values of the byte 
>>>>>>>>>>>>>>> contents. 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The utility of MIN/MAX on inet and boolean columns isn't very 
>>>>>>>>>>>>>>> clear either, although one can always imagine use cases. Fox 
>>>>>>>>>>>>>>> example, MAX of a boolean column can be used as a logical 
>>>>>>>>>>>>>>> disjunction.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If we were to special-case MIN/MAX functions to reject 
>>>>>>>>>>>>>>> collections, we should also reject other data types such as, at 
>>>>>>>>>>>>>>> least, blobs. That would require a deprecation plan.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Also, it's not that the comparator used by MIN/MAX is an 
>>>>>>>>>>>>>>> internal obscure thing. The action of that comparator is very 
>>>>>>>>>>>>>>> visible when any of those data types is used in a clustering 
>>>>>>>>>>>>>>> column, and it's used as the basis for "ORDER BY" clauses. 
>>>>>>>>>>>>>>> Should we also reject blobs, collections, tuples and UDTs on 
>>>>>>>>>>>>>>> "ORDER BY"? I don't think so.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I rather think that basing MIN/MAX on the regular order of the 
>>>>>>>>>>>>>>> column data type is consistent, easy to do and easy to 
>>>>>>>>>>>>>>> understand. 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I don't see the need to add rules explicitly forbidding some 
>>>>>>>>>>>>>>> data types on MIN/MAX functions just because we can't easily 
>>>>>>>>>>>>>>> figure out a use case for their ordering. Especially when we 
>>>>>>>>>>>>>>> are exposing that same ordering on clusterings and "ORDER BY".
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, 6 Dec 2022 at 18:56, J. D. Jordan 
>>>>>>>>>>>>>>>> <jeremiah.jor...@gmail.com> wrote:
>>>>>>>>>>>>>>>> If the functionality truly has never actually worked, then 
>>>>>>>>>>>>>>>> throwing an error that MAX is not supported for collections 
>>>>>>>>>>>>>>>> seems reasonable.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> But we should throw an error, I do not think we should have 
>>>>>>>>>>>>>>>> functions that aggregate across rows and functions that 
>>>>>>>>>>>>>>>> operate within a row use the same name.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> My expectation as a user would be that MAX either always 
>>>>>>>>>>>>>>>> aggregates across rows, so results in a single row of output 
>>>>>>>>>>>>>>>> or always operates within a row, so returns the full set of 
>>>>>>>>>>>>>>>> rows matching the query.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So if we want a max that aggregates across rows that works for 
>>>>>>>>>>>>>>>> collections we could change it to return the aggregated max 
>>>>>>>>>>>>>>>> across all rows. Or we just leave it as an error and if 
>>>>>>>>>>>>>>>> someone wants the max across all rows they would ask for 
>>>>>>>>>>>>>>>> MAX(COLLECTION_MAX(column)). Yes I still agree COLLECTION_MAX 
>>>>>>>>>>>>>>>> may be a bad name.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> > On Dec 6, 2022, at 11:55 AM, Benedict <bened...@apache.org> 
>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>> > 
>>>>>>>>>>>>>>>> > As far as I am aware it has never worked in a release, and 
>>>>>>>>>>>>>>>> > so deprecating it is probably not as challenging as you 
>>>>>>>>>>>>>>>> > think. Only folk that have been able to parse the raw bytes 
>>>>>>>>>>>>>>>> > of the collection in storage format would be affected - 
>>>>>>>>>>>>>>>> > which we can probably treat as zero.
>>>>>>>>>>>>>>>> > 
>>>>>>>>>>>>>>>> > 
>>>>>>>>>>>>>>>> >> On 6 Dec 2022, at 17:31, Jeremiah D Jordan 
>>>>>>>>>>>>>>>> >> <jeremiah.jor...@gmail.com> wrote:
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >>> 
>>>>>>>>>>>>>>>> >>> 1. I think it is a mistake to offer a function MAX that 
>>>>>>>>>>>>>>>> >>> operates over rows containing collections, returning the 
>>>>>>>>>>>>>>>> >>> collection with the most elements. This is just a 
>>>>>>>>>>>>>>>> >>> nonsensical operation to support IMO. We should decide as 
>>>>>>>>>>>>>>>> >>> a community whether we “fix” this aggregation, or remove 
>>>>>>>>>>>>>>>> >>> it.
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >> The current MAX function does not work this way afaik?  It 
>>>>>>>>>>>>>>>> >> returns the row with the column that has the highest value 
>>>>>>>>>>>>>>>> >> in clustering order sense, like if the collection was used 
>>>>>>>>>>>>>>>> >> as a clustering key.  While that also may have limited use, 
>>>>>>>>>>>>>>>> >> I don’t think it worth while to deprecate such use and all 
>>>>>>>>>>>>>>>> >> the headache that comes with doing so.
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >>> 2. I think “collection_" prefixed methods are 
>>>>>>>>>>>>>>>> >>> non-intuitive for discovery, and all-else equal it would 
>>>>>>>>>>>>>>>> >>> be better to use MAX,MIN, etc, same as for aggregations.
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >> If we actually wanted to move towards using the existing 
>>>>>>>>>>>>>>>> >> names with new meanings, then I think that would take us 
>>>>>>>>>>>>>>>> >> multiple major releases.  First deprecate existing use in 
>>>>>>>>>>>>>>>> >> current releases.  Then make it an error in the next major 
>>>>>>>>>>>>>>>> >> release X.  Then change the behavior in major release X+1.  
>>>>>>>>>>>>>>>> >> Just switching the behavior without having a major where 
>>>>>>>>>>>>>>>> >> such queries error out would make a bunch of user queries 
>>>>>>>>>>>>>>>> >> start returning “wrong” data.
>>>>>>>>>>>>>>>> >> Also I don’t think those functions being cross row 
>>>>>>>>>>>>>>>> >> aggregations for some column types, but within row 
>>>>>>>>>>>>>>>> >> collection operations for other types, is any more 
>>>>>>>>>>>>>>>> >> intuitive, and actually would be more confusing.  So I am 
>>>>>>>>>>>>>>>> >> -1 on using the same names.
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >>> 3. I think it is peculiar to permit methods named 
>>>>>>>>>>>>>>>> >>> collection_ to operate over non-collection types when they 
>>>>>>>>>>>>>>>> >>> are explicitly collection variants.
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >> While I could see some point to this, I do not think it 
>>>>>>>>>>>>>>>> >> would be confusing for something named collection_XXX to 
>>>>>>>>>>>>>>>> >> treat a non-collection as a collection of 1.  But maybe 
>>>>>>>>>>>>>>>> >> there is a better name for these function.  Rather than 
>>>>>>>>>>>>>>>> >> seeing them as collection variants, we should see them as 
>>>>>>>>>>>>>>>> >> variants that operate on the data in a single row, rather 
>>>>>>>>>>>>>>>> >> than aggregating across multiple rows.  But even with that 
>>>>>>>>>>>>>>>> >> perspective I don’t know what the best name would be.
>>>>>>>>>>>>>>>> >> 
>>>>>>>>>>>>>>>> >>>> On Dec 6, 2022, at 7:30 AM, Benedict 
>>>>>>>>>>>>>>>> >>>> <bened...@apache.org> wrote:
>>>>>>>>>>>>>>>> >>> 
>>>>>>>>>>>>>>>> >>> Thanks Andres, I think community input on direction here 
>>>>>>>>>>>>>>>> >>> will be invaluable. There’s a bunch of interrelated 
>>>>>>>>>>>>>>>> >>> tickets, and my opinions are as follows:
>>>>>>>>>>>>>>>> >>> 
>>>>>>>>>>>>>>>> >>> 1. I think it is a mistake to offer a function MAX that 
>>>>>>>>>>>>>>>> >>> operates over rows containing collections, returning the 
>>>>>>>>>>>>>>>> >>> collection with the most elements. This is just a 
>>>>>>>>>>>>>>>> >>> nonsensical operation to support IMO. We should decide as 
>>>>>>>>>>>>>>>> >>> a community whether we “fix” this aggregation, or remove 
>>>>>>>>>>>>>>>> >>> it.
>>>>>>>>>>>>>>>> >>> 2. I think “collection_" prefixed methods are 
>>>>>>>>>>>>>>>> >>> non-intuitive for discovery, and all-else equal it would 
>>>>>>>>>>>>>>>> >>> be better to use MAX,MIN, etc, same as for aggregations.
>>>>>>>>>>>>>>>> >>> 3. I think it is peculiar to permit methods named 
>>>>>>>>>>>>>>>> >>> collection_ to operate over non-collection types when they 
>>>>>>>>>>>>>>>> >>> are explicitly collection variants.
>>>>>>>>>>>>>>>> >>> 
>>>>>>>>>>>>>>>> >>> Given (1), (2) becomes simple except for COUNT which 
>>>>>>>>>>>>>>>> >>> remains ambiguous, but this could be solved by either 
>>>>>>>>>>>>>>>> >>> providing a separate method for collections (e.g. SIZE) 
>>>>>>>>>>>>>>>> >>> which seems fine to me, or by offering a precedence order 
>>>>>>>>>>>>>>>> >>> for matching and a keyword for overriding the precedence 
>>>>>>>>>>>>>>>> >>> order (e.g. COUNT(collection AS COLLECTION)).
>>>>>>>>>>>>>>>> >>> 
>>>>>>>>>>>>>>>> >>> Given (2), (3) is a little more difficult. However, I 
>>>>>>>>>>>>>>>> >>> think this can be solved several ways. 
>>>>>>>>>>>>>>>> >>> - We could permit explicit casts to collection types, that 
>>>>>>>>>>>>>>>> >>> for a collection type would be a no-op, and for a single 
>>>>>>>>>>>>>>>> >>> value would create a collection
>>>>>>>>>>>>>>>> >>> - With precedence orders, by always selecting the scalar 
>>>>>>>>>>>>>>>> >>> function last
>>>>>>>>>>>>>>>> >>> - By permitting WRITETIME to accept a binary operator 
>>>>>>>>>>>>>>>> >>> reduce function to resolve multiple values
>>>>>>>>>>>>>>>> >>> 
>>>>>>>>>>>>>>>> >>> These decisions all imply trade-offs on each other, and 
>>>>>>>>>>>>>>>> >>> affect the evolution of CQL, so I think community input 
>>>>>>>>>>>>>>>> >>> would be helpful.
>>>>>>>>>>>>>>>> >>> 
>>>>>>>>>>>>>>>> >>>>> On 6 Dec 2022, at 12:44, Andrés de la Peña 
>>>>>>>>>>>>>>>> >>>>> <adelap...@apache.org> wrote:
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> This will require some long introduction for context:
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> The MAX/MIN functions aggregate rows to get the row with 
>>>>>>>>>>>>>>>> >>>> min/max column value according to their comparator. For 
>>>>>>>>>>>>>>>> >>>> collections, the comparison is on the lexicographical 
>>>>>>>>>>>>>>>> >>>> order of the collection elements. That's the very same 
>>>>>>>>>>>>>>>> >>>> comparator that is used when collections are used as 
>>>>>>>>>>>>>>>> >>>> clustering keys and for ORDER BY.
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> However, a bug in the MIN/MAX aggregate functions used to 
>>>>>>>>>>>>>>>> >>>> make that the results were presented in their 
>>>>>>>>>>>>>>>> >>>> unserialized form, although the row selection was 
>>>>>>>>>>>>>>>> >>>> correct. That bug was recently solved by CASSANDRA-17811. 
>>>>>>>>>>>>>>>> >>>> During that ticket it was also considered the option of 
>>>>>>>>>>>>>>>> >>>> simply disabling MIN/MAX on collection since applying 
>>>>>>>>>>>>>>>> >>>> those functions to collections, since they don't seem 
>>>>>>>>>>>>>>>> >>>> super useful. However, that option was quickly discarded 
>>>>>>>>>>>>>>>> >>>> and the operation was fixed so the MIN/MAX functions 
>>>>>>>>>>>>>>>> >>>> correctly work for every data type.
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> As a byproduct of the internal improvements of that fix, 
>>>>>>>>>>>>>>>> >>>> CASSANDRA-8877 introduced a new set of functions that can 
>>>>>>>>>>>>>>>> >>>> perform aggregations of the elements of a collection. 
>>>>>>>>>>>>>>>> >>>> Those where named "map_keys", "map_values", 
>>>>>>>>>>>>>>>> >>>> "collection_min", "collection_max", "collection_sum", and 
>>>>>>>>>>>>>>>> >>>> "collection_count". Those are the names mentioned on the 
>>>>>>>>>>>>>>>> >>>> mail list thread about function naming conventions. 
>>>>>>>>>>>>>>>> >>>> Despite doing a kind of within-collection aggregation, 
>>>>>>>>>>>>>>>> >>>> these functions are not what we usually call aggregate 
>>>>>>>>>>>>>>>> >>>> functions, since they don't aggregate multiple rows 
>>>>>>>>>>>>>>>> >>>> together.
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> On a different line of work, CASSANDRA-17425 added to 
>>>>>>>>>>>>>>>> >>>> trunk a MAXWRITETIME function to get the max timestamp of 
>>>>>>>>>>>>>>>> >>>> a multi-cell column. However, the new collection 
>>>>>>>>>>>>>>>> >>>> functions can be used in combination with the WRITETIME 
>>>>>>>>>>>>>>>> >>>> and TTL functions to retrieve the min/max/sum/avg 
>>>>>>>>>>>>>>>> >>>> timestamp or ttl of a multi-cell column. Since the new 
>>>>>>>>>>>>>>>> >>>> functions give a generic way of aggreagting timestamps 
>>>>>>>>>>>>>>>> >>>> ant TTLs of multi-cell columns, CASSANDRA-18078 proposed 
>>>>>>>>>>>>>>>> >>>> to remove that MAXWRITETIME function.
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> Yifan Cai, author of the MAXWRITETIME function, agreed to 
>>>>>>>>>>>>>>>> >>>> remove that function in favour of the new generic 
>>>>>>>>>>>>>>>> >>>> collection functions. However, the MAXWRITETIME function 
>>>>>>>>>>>>>>>> >>>> can work on both single-cell and multi-cell columns, 
>>>>>>>>>>>>>>>> >>>> whereas "COLLECTION_MAX(WRITETIME(column))" would only 
>>>>>>>>>>>>>>>> >>>> work on multi-cell columns, That's because MAXWRITETIME 
>>>>>>>>>>>>>>>> >>>> of a not-multicell column doesn't return a collection, 
>>>>>>>>>>>>>>>> >>>> and one should simply use "WRITETIME(column)" instead. So 
>>>>>>>>>>>>>>>> >>>> it was proposed in CASSANDRA-18037 that collections 
>>>>>>>>>>>>>>>> >>>> functions applied to a not-collection value consider that 
>>>>>>>>>>>>>>>> >>>> value as the only element of a singleton collection. So, 
>>>>>>>>>>>>>>>> >>>> for example, COLLECTION_MAX(7) = COLLECTION_MAX([7]) = 7. 
>>>>>>>>>>>>>>>> >>>> That ticket has already been reviewed and it's mostly 
>>>>>>>>>>>>>>>> >>>> ready to commit.
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> Now we can go straight to the point:
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> Recently Benedict brought back the idea of deprecating 
>>>>>>>>>>>>>>>> >>>> aggregate functions applied to collections, the very same 
>>>>>>>>>>>>>>>> >>>> idea that was mentioned on CASSANDRA-17811 description 
>>>>>>>>>>>>>>>> >>>> almost four months ago. That way we could rename the new 
>>>>>>>>>>>>>>>> >>>> collection functions MIN/MAX/SUM/AVG, same as the classic 
>>>>>>>>>>>>>>>> >>>> aggregate functions. That way MIN/MAX/SUM/AVG would be an 
>>>>>>>>>>>>>>>> >>>> aggregate function when applied to not-collection 
>>>>>>>>>>>>>>>> >>>> columns, and a scalar function when applied to 
>>>>>>>>>>>>>>>> >>>> collection. We can't do that with COUNT because there 
>>>>>>>>>>>>>>>> >>>> would be an ambiguity, so the proposal for that case is 
>>>>>>>>>>>>>>>> >>>> renaming COLLECTION_COUNT to SIZE. Benedict, please 
>>>>>>>>>>>>>>>> >>>> correct me if I'm not correctly exposing the proposal.
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> I however would prefer to keep aggregate functions 
>>>>>>>>>>>>>>>> >>>> working on collections, and keep the names of the new 
>>>>>>>>>>>>>>>> >>>> collection functions as "COLLECTION_*". Reasons are:
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> 1 - Making aggregate functions not work on collections 
>>>>>>>>>>>>>>>> >>>> might be cosidered as breaking backward compatibility and 
>>>>>>>>>>>>>>>> >>>> require a deprecation plan. 
>>>>>>>>>>>>>>>> >>>> 2 - Keeping aggregate functions working on collections 
>>>>>>>>>>>>>>>> >>>> might not look superuseful, but they make the set of 
>>>>>>>>>>>>>>>> >>>> aggregate functions consistent and applicable to every 
>>>>>>>>>>>>>>>> >>>> column type.
>>>>>>>>>>>>>>>> >>>> 3 - Using the "COLLECTION_" prefix on collection 
>>>>>>>>>>>>>>>> >>>> functions establishes a clear distinction between row 
>>>>>>>>>>>>>>>> >>>> aggregations and collection aggregations, while at the 
>>>>>>>>>>>>>>>> >>>> same time exposing the analogy between each pair of 
>>>>>>>>>>>>>>>> >>>> functions.
>>>>>>>>>>>>>>>> >>>> 4 - Not using the "COLLECTION_" prefix forces us to 
>>>>>>>>>>>>>>>> >>>> search for workarounds such as using the column type when 
>>>>>>>>>>>>>>>> >>>> possible, or trying to figure out synonyms like in the 
>>>>>>>>>>>>>>>> >>>> case of COUNT/SIZE. Even if that works for this case, 
>>>>>>>>>>>>>>>> >>>> future functions can find more trouble when trying to 
>>>>>>>>>>>>>>>> >>>> figure out workarounds to avoid clashing with existing 
>>>>>>>>>>>>>>>> >>>> function names. For example, we might want to add a SIZE 
>>>>>>>>>>>>>>>> >>>> function that gets the size in bytes of any column, or we 
>>>>>>>>>>>>>>>> >>>> might want to add a MAX function that gets the maximum of 
>>>>>>>>>>>>>>>> >>>> a set of columns, etc. And example of the synonym-based 
>>>>>>>>>>>>>>>> >>>> approach that comes to mind is MySQL's MAX and GREATEST 
>>>>>>>>>>>>>>>> >>>> functions, where MAX is for row aggregation and GREATEST 
>>>>>>>>>>>>>>>> >>>> is for column aggregation.
>>>>>>>>>>>>>>>> >>>> 5 - If MIN/MAX function selection is based on the column 
>>>>>>>>>>>>>>>> >>>> type, we can't implement Yifan's proposal of making 
>>>>>>>>>>>>>>>> >>>> COLLECTION_MAX(7) = COLLECTION_MAX([7]) = 7, which would 
>>>>>>>>>>>>>>>> >>>> be very useful for combining collection functions with 
>>>>>>>>>>>>>>>> >>>> time functions.
>>>>>>>>>>>>>>>> >>>> 
>>>>>>>>>>>>>>>> >>>> What do others think? What should we do with aggregate 
>>>>>>>>>>>>>>>> >>>> functions on collections, collection functions and 
>>>>>>>>>>>>>>>> >>>> MAXWRITETIME?
>>>>>>>>>>>>>>>> >>

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

Reply via email to