RE: [DISCUSS] Improve Commitlog write path

2022-07-26 Thread Pawar, Amit
[Public]

Hi Bowen,

Thanks for the reply and it helped to identify the failure point. Tested 
compaction throughput with different values and threads active in compaction 
reports "java.lang.OutOfMemoryError: Map failed" error with 1024 MB/s earlier 
compared to other values. This shows with lower throughput such issues are 
going to come up not immediately but in days or weeks. Test results are given 
below.

++---+---+-+
| Records| Compaction Throughput | 5 large files In GB   | Disk usage (GB) |
++---+---+-+
| 20 | 8 | Not collected | 500 |
++---+---+-+
| 20 | 16| Not collected | 500 |
++---+---+-+
| 9  | 64| 3.5,3.5,3.5,3.5,3.5   | 273 |
++---+---+-+
| 9  | 128   | 3.5, 3.9,4.9,8.0, 15  | 287 |
++---+---+-+
| 9  | 256   | 11,11,12,16,20| 359 |
++---+---+-+
| 9  | 512   | 14,19,23,27,28| 469 |
++---+---+-+
| 9  | 1024  | 14,18,23,27,28| 458 |
++---+---+-+
| 9  | 0 | 6.9,6.9,7.0,28,28 | 223 |
++---+---+-+
||   |   | |
++---+---+-+

Issues observed with increasing compaction throughput.

  1.  Out of memory errors
  2.  Scores reduces as throughput increased
  3.  Files size grows as throughput increased
  4.  Insert failures are noticed


After this testing, I feel that this change is beneficial for workloads where 
data is not kept/left on nodes for too long. With lower throughput large system 
can ingest more data. Does it make sense ?

Thanks,
Amit

From: Bowen Song via dev 
Sent: Friday, July 22, 2022 4:37 PM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]

Hi Amit,



The compaction bottleneck is not an instantly visible limitation. It in effect 
limits the total size of writes over a fairly long period of time, because 
compaction is asynchronous and can be queued. That means if compaction can't 
keep up with the writes, they will be queued, and Cassandra remains fully 
functional until hitting the "too many open files" error or the filesystem runs 
out of free inodes. This can happen over many days or even weeks.
For the purpose of benchmarking, you may prefer to measure the max concurrent 
compaction throughput, instead of actually waiting for that breaking moment. 
The max write throughput is a fraction of the max concurrent compaction 
throughput, usually by a factor of 5 or more for a non-trivial sized table, 
depending on the table size in bytes. Search for "STCS write amplification" to 
understand why that's the case. That means if you've measured the max 
concurrent compaction throughput is 1GB/s, your average max insertion speed 
over a period of time is probably less than 200MB/s.

If you really decide to test the compaction bottleneck in action, it's better 
to measure the table size in bytes on disk, rather than the number of records. 
That's because not only the record count, but also the size of partitions and 
compression ratio, all have meaningful effect on the compaction workload. It's 
also worth mentioning that if using the STCS strategy, which is more suitable 
for write heavy workload, you may want to keep an eye on the SSTable data file 
size distribution. Initially the compaction may not involve any large SSTable 
data file, so it won't be a bottleneck at all. As more bigger SSTable data 
files are created over time, they will get involved in compactions more and 
more frequently. The bottleneck will only shows up (i.e. become problematic) 
when there's sufficient number of large SSTable data files involved in multiple 
concurrent compactions, occupying all available compactors and blocks (queuing) 
a larger number of compactions involving smaller SSTable data files.

Regards,

Bowen


On 22/07/2022 11:19, Pawar, Amit wrote:

[Public]

Thank you Bowen for your reply. Took some time to respond due to testing issue.

I tested again multi-threaded feature with number of rec

Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Andrés de la Peña
Hi all,

CASSANDRA-17575 has detected that token ranges in nodetool compact are
interpreted as closed on both sides. For example, the command "nodetool
compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of
interpreting token ranges is unusual since token ranges are usually
half-open, and I think that in the previous example one would expect that
the compacted tokens would be in (10, 50]. That's for example the way
nodetool repair works, and indeed the class org.apache.cassandra.dht.Range
is always half-open.

It's worth mentioning that, differently from nodetool repair, the help and
doc for nodetool compact doesn't specify whether the supplied start/end
tokens are inclusive or exclusive.

I think that ideally nodetool compact should interpret the provided token
ranges as half-open, to be consistent with how token ranges are usually
interpreted. However, this would change the way the tool has worked until
now. This change might be problematic for existing users relying on the old
behaviour. That would be especially severe for the case where the begin and
end token are the same, because interpreting [x, x] we would compact a
single token, whereas I think that interpreting (x, x] would compact all
the tokens. As for compacting ranges including multiple tokens, I think the
change wouldn't be so bad, since probably the supplied token ranges come
from tools that are already presenting the ranges as half-open. Also, if we
are splitting the full ring into smaller ranges, half-open intervals would
still work and would save us some repetitions.

So my question is: Should we change the behaviour of nodetool compact to
interpret the token ranges as half-opened, aligning it with the usual
interpretation of ranges? Or should we just document the current odd
behaviour to prevent compatibility issues?

A third option would be changing to half-opened ranges and also forbidding
ranges where the begin and end token are the same, to prevent the
accidental compaction of the entire ring. Note that nodetool repair also
forbids this type of token ranges.

What do you think?


Re: [DISCUSS] Improve Commitlog write path

2022-07-26 Thread Bowen Song via dev

Hi Amit,

That's some brilliant tests you have done there. It shows that the 
compaction throughput not only can be a bottleneck on the speed of 
insert operations, but it can also stress the JVM garbage collector. As 
a result of GC pressure, it can cause other things, such as insert, to fail.


Your last statement is correct. The commit log change can be beneficial 
for atypical workloads where large volume of data is getting inserted 
and then expired soon, for example when using the 
TimeWindowCompactionStrategy with short TTL. But I must point out that 
this kind of atypical usage is often an anti-pattern in Cassandra, as 
Cassandra is a database, not a queue or cache system.


This, however, is not saying the commit log change should not be 
introduced. As others have pointed out, it's down to a balancing act 
between the cost and benefit, and it will depend on the code complexity 
and the effect it has on typical workload, such as CPU and JVM heap 
usage. After all, we should prioritise the performance and reliability 
of typical usage before optimising for atypical use cases.


Best,
Bowen

On 26/07/2022 12:41, Pawar, Amit wrote:


[Public]


Hi Bowen,

Thanks for the reply and it helped to identify the failure point. 
Tested compaction throughput with different values and threads active 
in compaction reports “java.lang.OutOfMemoryError: Map failed” error 
with 1024 MB/s earlier compared to other values. This shows with lower 
throughput such issues are going to come up not immediately but in 
days or weeks. Test results are given below.


|++---+---+-+|

|| Records    | Compaction Throughput | 5 large files In GB | Disk 
usage (GB) ||


|++---+---+-+|

|| 20 | 8 | Not collected | 500 ||

|++---+---+-+|

|| 20 | 16    | Not collected | 500 ||

|++---+---+-+|

|| 9  | 64    | 3.5,3.5,3.5,3.5,3.5 | 
273     ||


|++---+---+-+|

|| 9  | 128   | 3.5, 3.9,4.9,8.0, 15 | 
287 ||


|++---+---+-+|

|| 9  | 256   | 11,11,12,16,20 | 
359 ||


|++---+---+-+|

|| 9  | 512   | 14,19,23,27,28 | 
469 ||


|++---+---+-+|

|| 9  | 1024  | 14,18,23,27,28 | 
458 ||


|++---+---+-+|

|| 9  | 0 | 6.9,6.9,7.0,28,28 | 
223 ||


|++---+---+-+|

|| |       | | ||

|++---+---+-+|

Issues observed with increasing compaction throughput.

 1. Out of memory errors
 2. Scores reduces as throughput increased
 3. Files size grows as throughput increased
 4. Insert failures are noticed

After this testing, I feel that this change is beneficial for 
workloads where data is not kept/left on nodes for too long. With 
lower throughput large system can ingest more data. Does it make sense ?


Thanks,

Amit

*From:* Bowen Song via dev 
*Sent:* Friday, July 22, 2022 4:37 PM
*To:* dev@cassandra.apache.org
*Subject:* Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]

Hi Amit,

The compaction bottleneck is not an instantly visible limitation. It 
in effect limits the total size of writes over a fairly long period of 
time, because compaction is asynchronous and can be queued. That means 
if compaction can't keep up with the writes, they will be queued, and 
Cassandra remains fully functional until hitting the "too many open 
files" error or the filesystem runs out of free inodes. This can 
happen over many days or even weeks.


For the purpose of benchmarking, you may prefer to measure the max 
concurrent compaction throughput, instead of actually waiting for that 
breaking moment. The max write throughput is a fraction of the max 
concurrent compaction throughput, usually by a factor of 5 or more for 
a non-trivial sized table, depending on the table size in bytes. 
Search for "STCS write amplification" to understand why that's the 
case. That means if you've measured the max concurrent compaction 
throughput is 1GB/s, your average max insertion speed over a period of 
time is probably less than 200MB/s.


If you really decide to test the compaction bottleneck in actio

RE: [DISCUSS] Improve Commitlog write path

2022-07-26 Thread Pawar, Amit
[Public]

Hi Bowen,

Thanks for your reply. Now it is clear that what are some benefits of this 
patch. I will send it for review once it is ready and hopefully it gets 
accepted.

Thanks,
Amit

From: Bowen Song via dev 
Sent: Tuesday, July 26, 2022 5:36 PM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]

Hi Amit,

That's some brilliant tests you have done there. It shows that the compaction 
throughput not only can be a bottleneck on the speed of insert operations, but 
it can also stress the JVM garbage collector. As a result of GC pressure, it 
can cause other things, such as insert, to fail.

Your last statement is correct. The commit log change can be beneficial for 
atypical workloads where large volume of data is getting inserted and then 
expired soon, for example when using the TimeWindowCompactionStrategy with 
short TTL. But I must point out that this kind of atypical usage is often an 
anti-pattern in Cassandra, as Cassandra is a database, not a queue or cache 
system.

This, however, is not saying the commit log change should not be introduced. As 
others have pointed out, it's down to a balancing act between the cost and 
benefit, and it will depend on the code complexity and the effect it has on 
typical workload, such as CPU and JVM heap usage. After all, we should 
prioritise the performance and reliability of typical usage before optimising 
for atypical use cases.

Best,
Bowen
On 26/07/2022 12:41, Pawar, Amit wrote:

[Public]

Hi Bowen,

Thanks for the reply and it helped to identify the failure point. Tested 
compaction throughput with different values and threads active in compaction 
reports "java.lang.OutOfMemoryError: Map failed" error with 1024 MB/s earlier 
compared to other values. This shows with lower throughput such issues are 
going to come up not immediately but in days or weeks. Test results are given 
below.

++---+---+-+
| Records| Compaction Throughput | 5 large files In GB   | Disk usage (GB) |
++---+---+-+
| 20 | 8 | Not collected | 500 |
++---+---+-+
| 20 | 16| Not collected | 500 |
++---+---+-+
| 9  | 64| 3.5,3.5,3.5,3.5,3.5   | 273 |
++---+---+-+
| 9  | 128   | 3.5, 3.9,4.9,8.0, 15  | 287 |
++---+---+-+
| 9  | 256   | 11,11,12,16,20| 359 |
++---+---+-+
| 9  | 512   | 14,19,23,27,28| 469 |
++---+---+-+
| 9  | 1024  | 14,18,23,27,28| 458 |
++---+---+-+
| 9  | 0 | 6.9,6.9,7.0,28,28 | 223 |
++---+---+-+
||   |   | |
++---+---+-+

Issues observed with increasing compaction throughput.

  1.  Out of memory errors
  2.  Scores reduces as throughput increased
  3.  Files size grows as throughput increased
  4.  Insert failures are noticed


After this testing, I feel that this change is beneficial for workloads where 
data is not kept/left on nodes for too long. With lower throughput large system 
can ingest more data. Does it make sense ?

Thanks,
Amit

From: Bowen Song via dev 

Sent: Friday, July 22, 2022 4:37 PM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]

Hi Amit,



The compaction bottleneck is not an instantly visible limitation. It in effect 
limits the total size of writes over a fairly long period of time, because 
compaction is asynchronous and can be queued. That means if compaction can't 
keep up with the writes, they will be queued, and Cassandra remains fully 
functional until hitting the "too many open files" error or the filesystem runs 
out of free inodes. This can happen over many days or even weeks.
For the purpose of benchmarking, you may prefer to measure the max concurrent 
compaction throughput, instead of actually waiting for that breaking moment. 
The max write throughput is a fraction of the max concurrent compactio

Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread J. D. Jordan
I like the third option, especially if it makes it consistent with repair, 
which has supported ranges longer and I would guess most people would think the 
compact ranges work the same as the repair ranges.

-Jeremiah Jordan

> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña  wrote:
> 
> 
> Hi all,
> 
> CASSANDRA-17575 has detected that token ranges in nodetool compact are 
> interpreted as closed on both sides. For example, the command "nodetool 
> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of 
> interpreting token ranges is unusual since token ranges are usually 
> half-open, and I think that in the previous example one would expect that the 
> compacted tokens would be in (10, 50]. That's for example the way nodetool 
> repair works, and indeed the class org.apache.cassandra.dht.Range is always 
> half-open.
> 
> It's worth mentioning that, differently from nodetool repair, the help and 
> doc for nodetool compact doesn't specify whether the supplied start/end 
> tokens are inclusive or exclusive.
> 
> I think that ideally nodetool compact should interpret the provided token 
> ranges as half-open, to be consistent with how token ranges are usually 
> interpreted. However, this would change the way the tool has worked until 
> now. This change might be problematic for existing users relying on the old 
> behaviour. That would be especially severe for the case where the begin and 
> end token are the same, because interpreting [x, x] we would compact a single 
> token, whereas I think that interpreting (x, x] would compact all the tokens. 
> As for compacting ranges including multiple tokens, I think the change 
> wouldn't be so bad, since probably the supplied token ranges come from tools 
> that are already presenting the ranges as half-open. Also, if we are 
> splitting the full ring into smaller ranges, half-open intervals would still 
> work and would save us some repetitions.
> 
> So my question is: Should we change the behaviour of nodetool compact to 
> interpret the token ranges as half-opened, aligning it with the usual 
> interpretation of ranges? Or should we just document the current odd 
> behaviour to prevent compatibility issues?
> 
> A third option would be changing to half-opened ranges and also forbidding 
> ranges where the begin and end token are the same, to prevent the accidental 
> compaction of the entire ring. Note that nodetool repair also forbids this 
> type of token ranges.
> 
> What do you think?


Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Brandon Williams
+1, I think that makes the most sense.

Kind Regards,
Brandon

On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan  wrote:
>
> I like the third option, especially if it makes it consistent with repair, 
> which has supported ranges longer and I would guess most people would think 
> the compact ranges work the same as the repair ranges.
>
> -Jeremiah Jordan
>
> > On Jul 26, 2022, at 6:49 AM, Andrés de la Peña  wrote:
> >
> > 
> > Hi all,
> >
> > CASSANDRA-17575 has detected that token ranges in nodetool compact are 
> > interpreted as closed on both sides. For example, the command "nodetool 
> > compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of 
> > interpreting token ranges is unusual since token ranges are usually 
> > half-open, and I think that in the previous example one would expect that 
> > the compacted tokens would be in (10, 50]. That's for example the way 
> > nodetool repair works, and indeed the class org.apache.cassandra.dht.Range 
> > is always half-open.
> >
> > It's worth mentioning that, differently from nodetool repair, the help and 
> > doc for nodetool compact doesn't specify whether the supplied start/end 
> > tokens are inclusive or exclusive.
> >
> > I think that ideally nodetool compact should interpret the provided token 
> > ranges as half-open, to be consistent with how token ranges are usually 
> > interpreted. However, this would change the way the tool has worked until 
> > now. This change might be problematic for existing users relying on the old 
> > behaviour. That would be especially severe for the case where the begin and 
> > end token are the same, because interpreting [x, x] we would compact a 
> > single token, whereas I think that interpreting (x, x] would compact all 
> > the tokens. As for compacting ranges including multiple tokens, I think the 
> > change wouldn't be so bad, since probably the supplied token ranges come 
> > from tools that are already presenting the ranges as half-open. Also, if we 
> > are splitting the full ring into smaller ranges, half-open intervals would 
> > still work and would save us some repetitions.
> >
> > So my question is: Should we change the behaviour of nodetool compact to 
> > interpret the token ranges as half-opened, aligning it with the usual 
> > interpretation of ranges? Or should we just document the current odd 
> > behaviour to prevent compatibility issues?
> >
> > A third option would be changing to half-opened ranges and also forbidding 
> > ranges where the begin and end token are the same, to prevent the 
> > accidental compaction of the entire ring. Note that nodetool repair also 
> > forbids this type of token ranges.
> >
> > What do you think?


Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Benedict Elliott Smith

I think a change like this could be dangerous for a lot of existing automation 
built atop nodetool.

I’m not sure this change is worthwhile. I think it would be better to introduce 
e.g. -ste and -ete for “start token exclusive” and “end token exclusive” so 
that users can opt-in to whichever scheme they prefer for their tooling, 
without breaking existing users.

> On 26 Jul 2022, at 14:22, Brandon Williams  wrote:
> 
> +1, I think that makes the most sense.
> 
> Kind Regards,
> Brandon
> 
> On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan  
> wrote:
>> 
>> I like the third option, especially if it makes it consistent with repair, 
>> which has supported ranges longer and I would guess most people would think 
>> the compact ranges work the same as the repair ranges.
>> 
>> -Jeremiah Jordan
>> 
>>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña  wrote:
>>> 
>>> 
>>> Hi all,
>>> 
>>> CASSANDRA-17575 has detected that token ranges in nodetool compact are 
>>> interpreted as closed on both sides. For example, the command "nodetool 
>>> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of 
>>> interpreting token ranges is unusual since token ranges are usually 
>>> half-open, and I think that in the previous example one would expect that 
>>> the compacted tokens would be in (10, 50]. That's for example the way 
>>> nodetool repair works, and indeed the class org.apache.cassandra.dht.Range 
>>> is always half-open.
>>> 
>>> It's worth mentioning that, differently from nodetool repair, the help and 
>>> doc for nodetool compact doesn't specify whether the supplied start/end 
>>> tokens are inclusive or exclusive.
>>> 
>>> I think that ideally nodetool compact should interpret the provided token 
>>> ranges as half-open, to be consistent with how token ranges are usually 
>>> interpreted. However, this would change the way the tool has worked until 
>>> now. This change might be problematic for existing users relying on the old 
>>> behaviour. That would be especially severe for the case where the begin and 
>>> end token are the same, because interpreting [x, x] we would compact a 
>>> single token, whereas I think that interpreting (x, x] would compact all 
>>> the tokens. As for compacting ranges including multiple tokens, I think the 
>>> change wouldn't be so bad, since probably the supplied token ranges come 
>>> from tools that are already presenting the ranges as half-open. Also, if we 
>>> are splitting the full ring into smaller ranges, half-open intervals would 
>>> still work and would save us some repetitions.
>>> 
>>> So my question is: Should we change the behaviour of nodetool compact to 
>>> interpret the token ranges as half-opened, aligning it with the usual 
>>> interpretation of ranges? Or should we just document the current odd 
>>> behaviour to prevent compatibility issues?
>>> 
>>> A third option would be changing to half-opened ranges and also forbidding 
>>> ranges where the begin and end token are the same, to prevent the 
>>> accidental compaction of the entire ring. Note that nodetool repair also 
>>> forbids this type of token ranges.
>>> 
>>> What do you think?




Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Ekaterina Dimitrova
I also like the idea with the flags plus improving the documentation as my
understanding was that it is not really documented and can be confusing and
risky for end users.

On Tue, 26 Jul 2022 at 9:28, Benedict Elliott Smith 
wrote:

>
> I think a change like this could be dangerous for a lot of existing
> automation built atop nodetool.
>
> I’m not sure this change is worthwhile. I think it would be better to
> introduce e.g. -ste and -ete for “start token exclusive” and “end token
> exclusive” so that users can opt-in to whichever scheme they prefer for
> their tooling, without breaking existing users.
>
> > On 26 Jul 2022, at 14:22, Brandon Williams  wrote:
> >
> > +1, I think that makes the most sense.
> >
> > Kind Regards,
> > Brandon
> >
> > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan 
> wrote:
> >>
> >> I like the third option, especially if it makes it consistent with
> repair, which has supported ranges longer and I would guess most people
> would think the compact ranges work the same as the repair ranges.
> >>
> >> -Jeremiah Jordan
> >>
> >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña 
> wrote:
> >>>
> >>> 
> >>> Hi all,
> >>>
> >>> CASSANDRA-17575 has detected that token ranges in nodetool compact are
> interpreted as closed on both sides. For example, the command "nodetool
> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of
> interpreting token ranges is unusual since token ranges are usually
> half-open, and I think that in the previous example one would expect that
> the compacted tokens would be in (10, 50]. That's for example the way
> nodetool repair works, and indeed the class org.apache.cassandra.dht.Range
> is always half-open.
> >>>
> >>> It's worth mentioning that, differently from nodetool repair, the help
> and doc for nodetool compact doesn't specify whether the supplied start/end
> tokens are inclusive or exclusive.
> >>>
> >>> I think that ideally nodetool compact should interpret the provided
> token ranges as half-open, to be consistent with how token ranges are
> usually interpreted. However, this would change the way the tool has worked
> until now. This change might be problematic for existing users relying on
> the old behaviour. That would be especially severe for the case where the
> begin and end token are the same, because interpreting [x, x] we would
> compact a single token, whereas I think that interpreting (x, x] would
> compact all the tokens. As for compacting ranges including multiple tokens,
> I think the change wouldn't be so bad, since probably the supplied token
> ranges come from tools that are already presenting the ranges as half-open.
> Also, if we are splitting the full ring into smaller ranges, half-open
> intervals would still work and would save us some repetitions.
> >>>
> >>> So my question is: Should we change the behaviour of nodetool compact
> to interpret the token ranges as half-opened, aligning it with the usual
> interpretation of ranges? Or should we just document the current odd
> behaviour to prevent compatibility issues?
> >>>
> >>> A third option would be changing to half-opened ranges and also
> forbidding ranges where the begin and end token are the same, to prevent
> the accidental compaction of the entire ring. Note that nodetool repair
> also forbids this type of token ranges.
> >>>
> >>> What do you think?
>
>
>


Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Josh McKenzie
+1 to Benedict and Ekaterina's points - adding new flags to explicitly 
introduce the new behavior and documenting the hell out of both the default and 
the new flags seems like the right play. There's quite possibly a lot of 
tooling out there in the wild that relies on the current behavior and breaking 
that for users is Bad.

~Josh

On Tue, Jul 26, 2022, at 10:05 AM, Ekaterina Dimitrova wrote:
> I also like the idea with the flags plus improving the documentation as my 
> understanding was that it is not really documented and can be confusing and 
> risky for end users. 
> 
> On Tue, 26 Jul 2022 at 9:28, Benedict Elliott Smith  
> wrote:
>> 
>> I think a change like this could be dangerous for a lot of existing 
>> automation built atop nodetool.
>> 
>> I’m not sure this change is worthwhile. I think it would be better to 
>> introduce e.g. -ste and -ete for “start token exclusive” and “end token 
>> exclusive” so that users can opt-in to whichever scheme they prefer for 
>> their tooling, without breaking existing users.
>> 
>> > On 26 Jul 2022, at 14:22, Brandon Williams  wrote:
>> > 
>> > +1, I think that makes the most sense.
>> > 
>> > Kind Regards,
>> > Brandon
>> > 
>> > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan  
>> > wrote:
>> >> 
>> >> I like the third option, especially if it makes it consistent with 
>> >> repair, which has supported ranges longer and I would guess most people 
>> >> would think the compact ranges work the same as the repair ranges.
>> >> 
>> >> -Jeremiah Jordan
>> >> 
>> >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña  
>> >>> wrote:
>> >>> 
>> >>> 
>> >>> Hi all,
>> >>> 
>> >>> CASSANDRA-17575 has detected that token ranges in nodetool compact are 
>> >>> interpreted as closed on both sides. For example, the command "nodetool 
>> >>> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of 
>> >>> interpreting token ranges is unusual since token ranges are usually 
>> >>> half-open, and I think that in the previous example one would expect 
>> >>> that the compacted tokens would be in (10, 50]. That's for example the 
>> >>> way nodetool repair works, and indeed the class 
>> >>> org.apache.cassandra.dht.Range is always half-open.
>> >>> 
>> >>> It's worth mentioning that, differently from nodetool repair, the help 
>> >>> and doc for nodetool compact doesn't specify whether the supplied 
>> >>> start/end tokens are inclusive or exclusive.
>> >>> 
>> >>> I think that ideally nodetool compact should interpret the provided 
>> >>> token ranges as half-open, to be consistent with how token ranges are 
>> >>> usually interpreted. However, this would change the way the tool has 
>> >>> worked until now. This change might be problematic for existing users 
>> >>> relying on the old behaviour. That would be especially severe for the 
>> >>> case where the begin and end token are the same, because interpreting 
>> >>> [x, x] we would compact a single token, whereas I think that 
>> >>> interpreting (x, x] would compact all the tokens. As for compacting 
>> >>> ranges including multiple tokens, I think the change wouldn't be so bad, 
>> >>> since probably the supplied token ranges come from tools that are 
>> >>> already presenting the ranges as half-open. Also, if we are splitting 
>> >>> the full ring into smaller ranges, half-open intervals would still work 
>> >>> and would save us some repetitions.
>> >>> 
>> >>> So my question is: Should we change the behaviour of nodetool compact to 
>> >>> interpret the token ranges as half-opened, aligning it with the usual 
>> >>> interpretation of ranges? Or should we just document the current odd 
>> >>> behaviour to prevent compatibility issues?
>> >>> 
>> >>> A third option would be changing to half-opened ranges and also 
>> >>> forbidding ranges where the begin and end token are the same, to prevent 
>> >>> the accidental compaction of the entire ring. Note that nodetool repair 
>> >>> also forbids this type of token ranges.
>> >>> 
>> >>> What do you think?
>> 
>> 


Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Derek Chen-Becker
+1 to new flags. A released, albeit undocumented, behavior is still a
contract with the end user. Flags (and documentation) seem like the right
path to address the situation.

Cheers,

Derek

On Tue, Jul 26, 2022 at 7:28 AM Benedict Elliott Smith 
wrote:

>
> I think a change like this could be dangerous for a lot of existing
> automation built atop nodetool.
>
> I’m not sure this change is worthwhile. I think it would be better to
> introduce e.g. -ste and -ete for “start token exclusive” and “end token
> exclusive” so that users can opt-in to whichever scheme they prefer for
> their tooling, without breaking existing users.
>
> > On 26 Jul 2022, at 14:22, Brandon Williams  wrote:
> >
> > +1, I think that makes the most sense.
> >
> > Kind Regards,
> > Brandon
> >
> > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan 
> wrote:
> >>
> >> I like the third option, especially if it makes it consistent with
> repair, which has supported ranges longer and I would guess most people
> would think the compact ranges work the same as the repair ranges.
> >>
> >> -Jeremiah Jordan
> >>
> >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña 
> wrote:
> >>>
> >>> 
> >>> Hi all,
> >>>
> >>> CASSANDRA-17575 has detected that token ranges in nodetool compact are
> interpreted as closed on both sides. For example, the command "nodetool
> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of
> interpreting token ranges is unusual since token ranges are usually
> half-open, and I think that in the previous example one would expect that
> the compacted tokens would be in (10, 50]. That's for example the way
> nodetool repair works, and indeed the class org.apache.cassandra.dht.Range
> is always half-open.
> >>>
> >>> It's worth mentioning that, differently from nodetool repair, the help
> and doc for nodetool compact doesn't specify whether the supplied start/end
> tokens are inclusive or exclusive.
> >>>
> >>> I think that ideally nodetool compact should interpret the provided
> token ranges as half-open, to be consistent with how token ranges are
> usually interpreted. However, this would change the way the tool has worked
> until now. This change might be problematic for existing users relying on
> the old behaviour. That would be especially severe for the case where the
> begin and end token are the same, because interpreting [x, x] we would
> compact a single token, whereas I think that interpreting (x, x] would
> compact all the tokens. As for compacting ranges including multiple tokens,
> I think the change wouldn't be so bad, since probably the supplied token
> ranges come from tools that are already presenting the ranges as half-open.
> Also, if we are splitting the full ring into smaller ranges, half-open
> intervals would still work and would save us some repetitions.
> >>>
> >>> So my question is: Should we change the behaviour of nodetool compact
> to interpret the token ranges as half-opened, aligning it with the usual
> interpretation of ranges? Or should we just document the current odd
> behaviour to prevent compatibility issues?
> >>>
> >>> A third option would be changing to half-opened ranges and also
> forbidding ranges where the begin and end token are the same, to prevent
> the accidental compaction of the entire ring. Note that nodetool repair
> also forbids this type of token ranges.
> >>>
> >>> What do you think?
>
>
>

-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Jeremiah D Jordan
Reading the responses here and taking a step back, I think the current behavior 
of nodetool compact is probably the correct behavior.  The main use case I can 
see for using nodetool compact is someone wants to take some sstable and 
compact it with all the overlapping sstables.  So you run “sstablemetadata” on 
the sstable and get the min and max tokens, and then you pass those in to 
nodetool compact.  In that case you do want the closed range.

This is different from running repair where you get the tokens from the 
nodes/nodetool ring and node those level token ranges ownership is half open 
when going from “token owned by node a to token owned by node b”.

So my initial thought/gut reaction that it should work like repair is 
misleading, because you don’t get the tokens from the same place you get them 
when running repair.

Making the command line options more explicit and documented does seem like it 
could be useful.

-Jeremiah Jordan

> On Jul 26, 2022, at 9:16 AM, Derek Chen-Becker  wrote:
> 
> +1 to new flags. A released, albeit undocumented, behavior is still a 
> contract with the end user. Flags (and documentation) seem like the right 
> path to address the situation.
> 
> Cheers,
> 
> Derek
> 
> On Tue, Jul 26, 2022 at 7:28 AM Benedict Elliott Smith  > wrote:
> 
> I think a change like this could be dangerous for a lot of existing 
> automation built atop nodetool.
> 
> I’m not sure this change is worthwhile. I think it would be better to 
> introduce e.g. -ste and -ete for “start token exclusive” and “end token 
> exclusive” so that users can opt-in to whichever scheme they prefer for their 
> tooling, without breaking existing users.
> 
> > On 26 Jul 2022, at 14:22, Brandon Williams  > > wrote:
> > 
> > +1, I think that makes the most sense.
> > 
> > Kind Regards,
> > Brandon
> > 
> > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan  > > wrote:
> >> 
> >> I like the third option, especially if it makes it consistent with repair, 
> >> which has supported ranges longer and I would guess most people would 
> >> think the compact ranges work the same as the repair ranges.
> >> 
> >> -Jeremiah Jordan
> >> 
> >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña  >>> > wrote:
> >>> 
> >>> 
> >>> Hi all,
> >>> 
> >>> CASSANDRA-17575 has detected that token ranges in nodetool compact are 
> >>> interpreted as closed on both sides. For example, the command "nodetool 
> >>> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of 
> >>> interpreting token ranges is unusual since token ranges are usually 
> >>> half-open, and I think that in the previous example one would expect that 
> >>> the compacted tokens would be in (10, 50]. That's for example the way 
> >>> nodetool repair works, and indeed the class 
> >>> org.apache.cassandra.dht.Range is always half-open.
> >>> 
> >>> It's worth mentioning that, differently from nodetool repair, the help 
> >>> and doc for nodetool compact doesn't specify whether the supplied 
> >>> start/end tokens are inclusive or exclusive.
> >>> 
> >>> I think that ideally nodetool compact should interpret the provided token 
> >>> ranges as half-open, to be consistent with how token ranges are usually 
> >>> interpreted. However, this would change the way the tool has worked until 
> >>> now. This change might be problematic for existing users relying on the 
> >>> old behaviour. That would be especially severe for the case where the 
> >>> begin and end token are the same, because interpreting [x, x] we would 
> >>> compact a single token, whereas I think that interpreting (x, x] would 
> >>> compact all the tokens. As for compacting ranges including multiple 
> >>> tokens, I think the change wouldn't be so bad, since probably the 
> >>> supplied token ranges come from tools that are already presenting the 
> >>> ranges as half-open. Also, if we are splitting the full ring into smaller 
> >>> ranges, half-open intervals would still work and would save us some 
> >>> repetitions.
> >>> 
> >>> So my question is: Should we change the behaviour of nodetool compact to 
> >>> interpret the token ranges as half-opened, aligning it with the usual 
> >>> interpretation of ranges? Or should we just document the current odd 
> >>> behaviour to prevent compatibility issues?
> >>> 
> >>> A third option would be changing to half-opened ranges and also 
> >>> forbidding ranges where the begin and end token are the same, to prevent 
> >>> the accidental compaction of the entire ring. Note that nodetool repair 
> >>> also forbids this type of token ranges.
> >>> 
> >>> What do you think?
> 
> 
> 
> 
> -- 
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker 
>  and   |
> | https://pgp.m

Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Andrés de la Peña
I think that's right, using a closed range makes sense to consume the data
provided by "sstablemetadata", which also provides closed ranges.
Especially because with half-open ranges we couldn't compact a sstable with
a single big partition, of which we might only know the token but no the
partition key.

So probably we should just add documentation about both -st and -et being
inclusive, and live with a different meaning of -st in repair and compact.

Also, the reason why this is so confusing in the test that started the
discussion is that those closed token ranges are internally represented as
"Range" objects, which are half-open by definition. So we should
document those methods, and maybe do some minor changes to avoid the use of
"Range" to silently represent closed token ranges.

On Tue, 26 Jul 2022 at 16:27, Jeremiah D Jordan 
wrote:

> Reading the responses here and taking a step back, I think the current
> behavior of nodetool compact is probably the correct behavior.  The main
> use case I can see for using nodetool compact is someone wants to take some
> sstable and compact it with all the overlapping sstables.  So you run
> “sstablemetadata” on the sstable and get the min and max tokens, and then
> you pass those in to nodetool compact.  In that case you do want the closed
> range.
>
> This is different from running repair where you get the tokens from the
> nodes/nodetool ring and node those level token ranges ownership is half
> open when going from “token owned by node a to token owned by node b”.
>
> So my initial thought/gut reaction that it should work like repair is
> misleading, because you don’t get the tokens from the same place you get
> them when running repair.
>
> Making the command line options more explicit and documented does seem
> like it could be useful.
>
> -Jeremiah Jordan
>
> On Jul 26, 2022, at 9:16 AM, Derek Chen-Becker 
> wrote:
>
> +1 to new flags. A released, albeit undocumented, behavior is still a
> contract with the end user. Flags (and documentation) seem like the right
> path to address the situation.
>
> Cheers,
>
> Derek
>
> On Tue, Jul 26, 2022 at 7:28 AM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>>
>> I think a change like this could be dangerous for a lot of existing
>> automation built atop nodetool.
>>
>> I’m not sure this change is worthwhile. I think it would be better to
>> introduce e.g. -ste and -ete for “start token exclusive” and “end token
>> exclusive” so that users can opt-in to whichever scheme they prefer for
>> their tooling, without breaking existing users.
>>
>> > On 26 Jul 2022, at 14:22, Brandon Williams  wrote:
>> >
>> > +1, I think that makes the most sense.
>> >
>> > Kind Regards,
>> > Brandon
>> >
>> > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan 
>> wrote:
>> >>
>> >> I like the third option, especially if it makes it consistent with
>> repair, which has supported ranges longer and I would guess most people
>> would think the compact ranges work the same as the repair ranges.
>> >>
>> >> -Jeremiah Jordan
>> >>
>> >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña 
>> wrote:
>> >>>
>> >>> 
>> >>> Hi all,
>> >>>
>> >>> CASSANDRA-17575 has detected that token ranges in nodetool compact
>> are interpreted as closed on both sides. For example, the command "nodetool
>> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of
>> interpreting token ranges is unusual since token ranges are usually
>> half-open, and I think that in the previous example one would expect that
>> the compacted tokens would be in (10, 50]. That's for example the way
>> nodetool repair works, and indeed the class org.apache.cassandra.dht.Range
>> is always half-open.
>> >>>
>> >>> It's worth mentioning that, differently from nodetool repair, the
>> help and doc for nodetool compact doesn't specify whether the supplied
>> start/end tokens are inclusive or exclusive.
>> >>>
>> >>> I think that ideally nodetool compact should interpret the provided
>> token ranges as half-open, to be consistent with how token ranges are
>> usually interpreted. However, this would change the way the tool has worked
>> until now. This change might be problematic for existing users relying on
>> the old behaviour. That would be especially severe for the case where the
>> begin and end token are the same, because interpreting [x, x] we would
>> compact a single token, whereas I think that interpreting (x, x] would
>> compact all the tokens. As for compacting ranges including multiple tokens,
>> I think the change wouldn't be so bad, since probably the supplied token
>> ranges come from tools that are already presenting the ranges as half-open.
>> Also, if we are splitting the full ring into smaller ranges, half-open
>> intervals would still work and would save us some repetitions.
>> >>>
>> >>> So my question is: Should we change the behaviour of nodetool compact
>> to interpret the token ranges as half-opened, aligning it with the usual
>> interpretation of r