Indeed requiring AF for "select * from ks.tb where p1 = 1 and c1 = 2 and
col2 = 1", where p1 and c1 are all the columns in the primary key, sounds
like a bug.

I think the criterion in the code is that we require AF if there is any
column restriction that cannot be processed by the primary key or a
secondary index. The error message indeed seems to reject any kind of
filtering, independently of primary key filters. We can see this even
without defined clustering keys:

CREATE TABLE t (k int PRIMARY KEY, v int);
SELECT * FROM  t WHERE  k = 1 AND v = 1; # requires AF

That clashes with documentation, where it's said that AF is required for
filters that require scanning all partitions. If we were to adapt the code
to the behaviour described in documentation we shouldn't require AF if
there are restrictions specifying a partition key. Or possibly a group of
partition keys, if a IN restriction is used. So both within row and within
partition filtering wouldn't require AF.

Regarding adding a new ALLOW FILTERING WITHIN PARTITION, I think we could
just add a guardrail to directly disallow those queries, without needing to
add the WITHIN PARTITION clause to the CQL grammar.

On Thu, 13 Apr 2023 at 11:11, Henrik Ingo <henrik.i...@datastax.com> wrote:

>
>
> On Thu, Apr 13, 2023 at 10:20 AM Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
>> Somebody correct me if I am wrong but "partition key" itself is not
>> enough (primary keys = partition keys + clustering columns). It will
>> require ALLOW FILTERING when clustering columns are not specified either.
>>
>> create table ks.tb (p1 int, c1 int, col1 int, col2 int, primary key (p1,
>> c1));
>> select * from ks.tb where p1 = 1 and col1 = 2;     // this will require
>> allow filtering
>>
>> The documentation seems to omit this fact.
>>
>
> It does seem so.
>
> That said, personally I was assuming, and would still argue it's the
> optimal choice, that the documentation was right and reality is wrong.
>
> If there is a partition key, then the query can avoid scanning the entire
> table, across all nodes, potentially petabytes.
>
> If a query specifies a partition key but not the full clustering key, of
> course there will be some scanning needed, but this is marginal compared to
> the need to scan the entire table. Even in the worst case, a partition with
> 2 billion cells, we are talking about seconds to filter the result from the
> single partition.
>
> > Aha I get what you all mean:
>
> No, I actually think both are unnecessary. But yeah, certainly this latter
> case is a bug?
>
> henrik
>
> --
>
> Henrik Ingo
>
> c. +358 40 569 7354
>
> w. www.datastax.com
>
> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>
> <https://www.linkedin.com/company/datastax/>
> <https://github.com/datastax/>
>
>

Reply via email to