RE: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Pawar, Amit
[Public]

Thank you Bowen for your reply. Took some time to respond due to testing issue.

I tested again multi-threaded feature with number of records from 260 million 
to 2 billion and still improvement is seen around 80% of Ramdisk score. It is 
still possible that compaction can become new bottleneck and could be new 
opportunity to fix it. I am newbie here and possible that I failed to 
understand your suggestion completely.  At-least with this testing 
multi-threading benefit is reflecting in score.

Do you think multi-threading is good to have now ? else please suggest if I 
need to test further.

Thanks,
Amit

From: Bowen Song via dev 
Sent: Wednesday, July 20, 2022 4:13 PM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]

>From my past experience, the bottleneck for insert heavy workload is likely to 
>be compaction, not commit log. You initially may see commit log as the 
>bottleneck when the table size is relatively small, but as the table size 
>increases, compaction will likely take its place and become the new bottleneck.
On 20/07/2022 11:11, Pawar, Amit wrote:

[Public]

Hi all,

(My previous mail is not appearing in mailing list and resending again after 2 
days)

Myself Amit and working at AMD Bangalore, India. I am new to Cassandra and need 
to do Cassandra testing on large core systems. Usually should test on 
multi-nodes Cassandra but started with Single node testing to understand how 
Cassandra scales with increasing core counts.

Test details:
Operation: Insert > 90% (insert heavy)
Operation: Scan < 10%
Cassandra: 3.11.10 and trunk
Benchmark: TPCx-IOT (similar to YCSB)

Results shows scaling is poor beyond 16 cores and it is almost linear. 
Following settings are the common settings helped to get the better scores.

  1.  Memtable heap allocation: offheap_objects
  2.  memtable_flush_writers > 4
  3.  Java heap: 8-32GB with survivor ratio tuning
  4.  Separate storage space for Commitlog and Data.

Many online blogs suggest to add new Cassandra node when unable to take high 
writes. But with large systems, high writes should be easily taken due to many 
cores. Need was to improve the scaling with more cores so this suggestion 
didn't help. After many rounds of testing it was observed that current 
implementation uses single thread for Commitlog syncing activity. Commitlog 
files are mapped using mmap system call and changes are written with msync. 
Periodic syncing with JVisualvm tool shows

  1.  thread is not 100% busy with Ramdisk usage for Commitlog storage and 
scaling improved on large systems. Ramdisk scores > 2 X NVME score.
  2.  thread becomes 100% busy with NVME usage for Commiglog and score does not 
improve much beyond 16 cores.

Linux kernel uses 4K pages for mapped memory with mmap system call. So, to 
understand this further, disk I/O testing was done using FIO tool and results 
shows

  1.  NVME 4K random R/W throughput is very less with single thread and it 
improves with multi-threaded.
  2.  Ramdisk 4K random R/W throughput is good with single thread only and also 
better with multi-threaded

Based on the FIO test results following two ideas were tested for Commitlog 
files with Cassandra-3.1.10 sources.

  1.  Enable Direct IO feature for Commitlog files (similar to  
[CASSANDRA-14466] Enable Direct I/O - ASF JIRA 
(apache.org)
 )
  2.  Enable Multi-threaded syncing for Commitlog files.

First one need to retest. Interestingly second one helped to improve the score 
with "NVME" disk. NVME disk configuration score is almost within 80-90% of 
ramdisk and 2 times of single threaded implementation. Multithreading enabled 
by adding new thread pool in "AbstractCommitLogSegmentManager" class and 
changed syncing thread as manager thread for this new thread pool to take care 
synchronization. Only tested with Cassandra-3.11.10 and needs complete testing 
but this change is working in my test environment. Tried these few experiments 
so that I could discuss here and seek your valuable suggestions to identify the 
right fix for insert heavy workloads.


  1.  Is it good idea to convert single threaded syncing to multi-threading 
implementation to improve the disk IO?
  2.  Direct I/O throughput is high with single thread and best fit for 
Commitlog case due to file size. This will improve writes on small to large 
systems. Good to bring this support for Commitlog files?

Please suggest.

Thanks,
Amit Pawar


Re: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Benedict
Hi Amit,

I am inclined to agree with Bowen Song, in that benchmarks from an initially 
empty cluster tend to lean more heavily on memtable and commit log bottlenecks 
than a real-world long running cluster does, as the algorithmic complexity of 
LSMTs begin to bite much later while the cost of the commit log and memtable 
stay fairly constant. The more data you have, the less commit log and memtable 
performance directly matter, and memtable size becomes much more important 
along with compaction efficiency.

That said, reducing bottlenecks is still a good thing if the additional 
complexity is not severe - and this is still an unfortunately common way that 
we benchmark changes today, anyway.


> On 22 Jul 2022, at 11:20, Pawar, Amit  wrote:
> 
> 
> [Public]
>  
> Thank you Bowen for your reply. Took some time to respond due to testing 
> issue.
>  
> I tested again multi-threaded feature with number of records from 260 million 
> to 2 billion and still improvement is seen around 80% of Ramdisk score. It is 
> still possible that compaction can become new bottleneck and could be new 
> opportunity to fix it. I am newbie here and possible that I failed to 
> understand your suggestion completely.  At-least with this testing 
> multi-threading benefit is reflecting in score.
>  
> Do you think multi-threading is good to have now ? else please suggest if I 
> need to test further.
>  
> Thanks,
> Amit
>  
> From: Bowen Song via dev  
> Sent: Wednesday, July 20, 2022 4:13 PM
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Improve Commitlog write path
>  
> [CAUTION: External Email]
> From my past experience, the bottleneck for insert heavy workload is likely 
> to be compaction, not commit log. You initially may see commit log as the 
> bottleneck when the table size is relatively small, but as the table size 
> increases, compaction will likely take its place and become the new 
> bottleneck.
> 
> On 20/07/2022 11:11, Pawar, Amit wrote:
> [Public]
>  
> Hi all,
>  
> (My previous mail is not appearing in mailing list and resending again after 
> 2 days)
>  
> Myself Amit and working at AMD Bangalore, India. I am new to Cassandra and 
> need to do Cassandra testing on large core systems. Usually should test on 
> multi-nodes Cassandra but started with Single node testing to understand how 
> Cassandra scales with increasing core counts.
>  
> Test details:
> Operation: Insert > 90% (insert heavy)
> Operation: Scan < 10%
> Cassandra: 3.11.10 and trunk
> Benchmark: TPCx-IOT (similar to YCSB)
>  
> Results shows scaling is poor beyond 16 cores and it is almost linear. 
> Following settings are the common settings helped to get the better scores.
> Memtable heap allocation: offheap_objects
> memtable_flush_writers > 4
> Java heap: 8-32GB with survivor ratio tuning
> Separate storage space for Commitlog and Data.
>  
> Many online blogs suggest to add new Cassandra node when unable to take high 
> writes. But with large systems, high writes should be easily taken due to 
> many cores. Need was to improve the scaling with more cores so this 
> suggestion didn’t help. After many rounds of testing it was observed that 
> current implementation uses single thread for Commitlog syncing activity. 
> Commitlog files are mapped using mmap system call and changes are written 
> with msync. Periodic syncing with JVisualvm tool shows
> thread is not 100% busy with Ramdisk usage for Commitlog storage and scaling 
> improved on large systems. Ramdisk scores > 2 X NVME score.
> thread becomes 100% busy with NVME usage for Commiglog and score does not 
> improve much beyond 16 cores.
>  
> Linux kernel uses 4K pages for mapped memory with mmap system call. So, to 
> understand this further, disk I/O testing was done using FIO tool and results 
> shows
> NVME 4K random R/W throughput is very less with single thread and it improves 
> with multi-threaded.
> Ramdisk 4K random R/W throughput is good with single thread only and also 
> better with multi-threaded
>  
> Based on the FIO test results following two ideas were tested for Commitlog 
> files with Cassandra-3.1.10 sources.
> Enable Direct IO feature for Commitlog files (similar to  [CASSANDRA-14466] 
> Enable Direct I/O - ASF JIRA (apache.org) )
> Enable Multi-threaded syncing for Commitlog files.
>  
> First one need to retest. Interestingly second one helped to improve the 
> score with “NVME” disk. NVME disk configuration score is almost within 80-90% 
> of ramdisk and 2 times of single threaded implementation. Multithreading 
> enabled by adding new thread pool in “AbstractCommitLogSegmentManager” class 
> and changed syncing thread as manager thread for this new thread pool to take 
> care synchronization. Only tested with Cassandra-3.11.10 and needs complete 
> testing but this change is working in my test environment. Tried these few 
> experiments so that I could discuss here and seek your valuable suggestions 
> to identify the right fix for insert heavy w

Re: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Bowen Song via dev

Hi Amit,


The compaction bottleneck is not an instantly visible limitation. It in 
effect limits the total size of writes over a fairly long period of 
time, because compaction is asynchronous and can be queued. That means 
if compaction can't keep up with the writes, they will be queued, and 
Cassandra remains fully functional until hitting the "too many open 
files" error or the filesystem runs out of free inodes. This can happen 
over many days or even weeks.


For the purpose of benchmarking, you may prefer to measure the max 
concurrent compaction throughput, instead of actually waiting for that 
breaking moment. The max write throughput is a fraction of the max 
concurrent compaction throughput, usually by a factor of 5 or more for a 
non-trivial sized table, depending on the table size in bytes. Search 
for "STCS write amplification" to understand why that's the case. That 
means if you've measured the max concurrent compaction throughput is 
1GB/s, your average max insertion speed over a period of time is 
probably less than 200MB/s.


If you really decide to test the compaction bottleneck in action, it's 
better to measure the table size in bytes on disk, rather than the 
number of records. That's because not only the record count, but also 
the size of partitions and compression ratio, all have meaningful effect 
on the compaction workload. It's also worth mentioning that if using the 
STCS strategy, which is more suitable for write heavy workload, you may 
want to keep an eye on the SSTable data file size distribution. 
Initially the compaction may not involve any large SSTable data file, so 
it won't be a bottleneck at all. As more bigger SSTable data files are 
created over time, they will get involved in compactions more and more 
frequently. The bottleneck will only shows up (i.e. become problematic) 
when there's sufficient number of large SSTable data files involved in 
multiple concurrent compactions, occupying all available compactors and 
blocks (queuing) a larger number of compactions involving smaller 
SSTable data files.



Regards,

Bowen


On 22/07/2022 11:19, Pawar, Amit wrote:


[Public]

Thank you Bowen for your reply. Took some time to respond due to 
testing issue.


I tested again multi-threaded feature with number of records from 260 
million to 2 billion and still improvement is seen around 80% of 
Ramdisk score. It is still possible that compaction can become new 
bottleneck and could be new opportunity to fix it. I am newbie here 
and possible that I failed to understand your suggestion completely. 
 At-least with this testing multi-threading benefit is reflecting in 
score.


Do you think multi-threading is good to have now ? else please suggest 
if I need to test further.


Thanks,

Amit

*From:* Bowen Song via dev 
*Sent:* Wednesday, July 20, 2022 4:13 PM
*To:* dev@cassandra.apache.org
*Subject:* Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]

From my past experience, the bottleneck for insert heavy workload is 
likely to be compaction, not commit log. You initially may see commit 
log as the bottleneck when the table size is relatively small, but as 
the table size increases, compaction will likely take its place and 
become the new bottleneck.


On 20/07/2022 11:11, Pawar, Amit wrote:

[Public]

Hi all,

(My previous mail is not appearing in mailing list and resending
again after 2 days)

Myself Amit and working at AMD Bangalore, India. I am new to
Cassandra and need to do Cassandra testing on large core systems.
Usually should test on multi-nodes Cassandra but started with
Single node testing to understand how Cassandra scales with
increasing core counts.

Test details:

Operation: Insert > 90% (insert heavy)

Operation: Scan < 10%

Cassandra: 3.11.10 and trunk

Benchmark: TPCx-IOT (similar to YCSB)

Results shows scaling is poor beyond 16 cores and it is almost
linear. Following settings are the common settings helped to get
the better scores.

 1. Memtable heap allocation: offheap_objects
 2. memtable_flush_writers > 4
 3. Java heap: 8-32GB with survivor ratio tuning
 4. Separate storage space for Commitlog and Data.

Many online blogs suggest to add new Cassandra node when unable to
take high writes. But with large systems, high writes should be
easily taken due to many cores. Need was to improve the scaling
with more cores so this suggestion didn’t help. After many rounds
of testing it was observed that current implementation uses single
thread for Commitlog syncing activity. Commitlog files are mapped
using mmap system call and changes are written with msync.
Periodic syncing with JVisualvm tool shows

 1. thread is not 100% busy with Ramdisk usage for Commitlog
storage and scaling improved on large systems. Ramdisk scores
> 2 X NVME score.
 2. thread becomes 100% busy with NVME usage for Commiglog 

Re: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Brad
When thinking about compaction vs commit log bottlenecks, there would be
very different profiles between TWCS vs STCS as well as for transient
tables with short TTLs which never accumulate large data, but have heavy
I/O.

Amit's analysis strikes me as insightful.  Multi-threading the commit log
might resolve a pinch point for some classes of workloads, particularly if
it could be done in a reactive manner and wasn't too complex.

On Fri, Jul 22, 2022 at 6:19 AM Pawar, Amit  wrote:

> [Public]
>
>
>
> Thank you Bowen for your reply. Took some time to respond due to testing
> issue.
>
>
>
> I tested again multi-threaded feature with number of records from 260
> million to 2 billion and still improvement is seen around 80% of Ramdisk
> score. It is still possible that compaction can become new bottleneck and
> could be new opportunity to fix it. I am newbie here and possible that I
> failed to understand your suggestion completely.  At-least with this
> testing multi-threading benefit is reflecting in score.
>
>
>
> Do you think multi-threading is good to have now ? else please suggest if
> I need to test further.
>
>
>
> Thanks,
>
> Amit
>
>
>
> *From:* Bowen Song via dev 
> *Sent:* Wednesday, July 20, 2022 4:13 PM
> *To:* dev@cassandra.apache.org
> *Subject:* Re: [DISCUSS] Improve Commitlog write path
>
>
>
> [CAUTION: External Email]
>
> From my past experience, the bottleneck for insert heavy workload is
> likely to be compaction, not commit log. You initially may see commit log
> as the bottleneck when the table size is relatively small, but as the table
> size increases, compaction will likely take its place and become the new
> bottleneck.
>
> On 20/07/2022 11:11, Pawar, Amit wrote:
>
> [Public]
>
>
>
> Hi all,
>
>
>
> (My previous mail is not appearing in mailing list and resending again
> after 2 days)
>
>
>
> Myself Amit and working at AMD Bangalore, India. I am new to Cassandra and
> need to do Cassandra testing on large core systems. Usually should test on
> multi-nodes Cassandra but started with Single node testing to understand
> how Cassandra scales with increasing core counts.
>
>
>
> Test details:
>
> Operation: Insert > 90% (insert heavy)
>
> Operation: Scan < 10%
>
> Cassandra: 3.11.10 and trunk
>
> Benchmark: TPCx-IOT (similar to YCSB)
>
>
>
> Results shows scaling is poor beyond 16 cores and it is almost linear.
> Following settings are the common settings helped to get the better scores.
>
>1. Memtable heap allocation: offheap_objects
>2. memtable_flush_writers > 4
>3. Java heap: 8-32GB with survivor ratio tuning
>4. Separate storage space for Commitlog and Data.
>
>
>
> Many online blogs suggest to add new Cassandra node when unable to take
> high writes. But with large systems, high writes should be easily taken due
> to many cores. Need was to improve the scaling with more cores so this
> suggestion didn’t help. After many rounds of testing it was observed that
> current implementation uses single thread for Commitlog syncing activity.
> Commitlog files are mapped using mmap system call and changes are written
> with msync. Periodic syncing with JVisualvm tool shows
>
>1. thread is not 100% busy with Ramdisk usage for Commitlog storage
>and scaling improved on large systems. Ramdisk scores > 2 X NVME score.
>2. thread becomes 100% busy with NVME usage for Commiglog and score
>does not improve much beyond 16 cores.
>
>
>
> Linux kernel uses 4K pages for mapped memory with mmap system call. So, to
> understand this further, disk I/O testing was done using FIO tool and
> results shows
>
>1. NVME 4K random R/W throughput is very less with single thread and
>it improves with multi-threaded.
>2. Ramdisk 4K random R/W throughput is good with single thread only
>and also better with multi-threaded
>
>
>
> Based on the FIO test results following two ideas were tested for
> Commitlog files with Cassandra-3.1.10 sources.
>
>1. Enable Direct IO feature for Commitlog files (similar to  
> [CASSANDRA-14466]
>Enable Direct I/O - ASF JIRA (apache.org)
>
> 
>)
>2. Enable Multi-threaded syncing for Commitlog files.
>
>
>
> First one need to retest. Interestingly second one helped to improve the
> score with “NVME” disk. NVME disk configuration score is almost within
> 80-90% of ramdisk and 2 times of single threaded implementation.
> Multithreading enabled by adding new thread pool in
> “AbstractCommitLogSegmentManager” class and changed syncing thread as
> manager thread for this new thread pool to take care synchro

RE: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Pawar, Amit
[AMD Official Use Only - General]

Hi Benedict,

The whole point is Cassandra as a software should take advantage of hardware 
wherever possible. So reducing Commitlog bottleneck may help some workloads and 
not all. I am already working on trunk now and will share the patch. If changes 
looks good and not very complex then please give your feedback. Your input 
might help to reduce the complexity of change and possibly patch can be 
accepted.

Thanks,
Amit

From: Benedict 
Sent: Friday, July 22, 2022 3:56 PM
To: dev@cassandra.apache.org
Cc: Bowen Song ; Raghavendra, Prakash 

Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]
Hi Amit,

I am inclined to agree with Bowen Song, in that benchmarks from an initially 
empty cluster tend to lean more heavily on memtable and commit log bottlenecks 
than a real-world long running cluster does, as the algorithmic complexity of 
LSMTs begin to bite much later while the cost of the commit log and memtable 
stay fairly constant. The more data you have, the less commit log and memtable 
performance directly matter, and memtable size becomes much more important 
along with compaction efficiency.

That said, reducing bottlenecks is still a good thing if the additional 
complexity is not severe - and this is still an unfortunately common way that 
we benchmark changes today, anyway.


On 22 Jul 2022, at 11:20, Pawar, Amit 
mailto:amit.pa...@amd.com>> wrote:


[Public]

Thank you Bowen for your reply. Took some time to respond due to testing issue.

I tested again multi-threaded feature with number of records from 260 million 
to 2 billion and still improvement is seen around 80% of Ramdisk score. It is 
still possible that compaction can become new bottleneck and could be new 
opportunity to fix it. I am newbie here and possible that I failed to 
understand your suggestion completely.  At-least with this testing 
multi-threading benefit is reflecting in score.

Do you think multi-threading is good to have now ? else please suggest if I 
need to test further.

Thanks,
Amit

From: Bowen Song via dev 
mailto:dev@cassandra.apache.org>>
Sent: Wednesday, July 20, 2022 4:13 PM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]

From my past experience, the bottleneck for insert heavy workload is likely to 
be compaction, not commit log. You initially may see commit log as the 
bottleneck when the table size is relatively small, but as the table size 
increases, compaction will likely take its place and become the new bottleneck.
On 20/07/2022 11:11, Pawar, Amit wrote:

[Public]

Hi all,

(My previous mail is not appearing in mailing list and resending again after 2 
days)

Myself Amit and working at AMD Bangalore, India. I am new to Cassandra and need 
to do Cassandra testing on large core systems. Usually should test on 
multi-nodes Cassandra but started with Single node testing to understand how 
Cassandra scales with increasing core counts.

Test details:
Operation: Insert > 90% (insert heavy)
Operation: Scan < 10%
Cassandra: 3.11.10 and trunk
Benchmark: TPCx-IOT (similar to YCSB)

Results shows scaling is poor beyond 16 cores and it is almost linear. 
Following settings are the common settings helped to get the better scores.

1.  Memtable heap allocation: offheap_objects

2.  memtable_flush_writers > 4

3.  Java heap: 8-32GB with survivor ratio tuning

4.  Separate storage space for Commitlog and Data.

Many online blogs suggest to add new Cassandra node when unable to take high 
writes. But with large systems, high writes should be easily taken due to many 
cores. Need was to improve the scaling with more cores so this suggestion 
didn’t help. After many rounds of testing it was observed that current 
implementation uses single thread for Commitlog syncing activity. Commitlog 
files are mapped using mmap system call and changes are written with msync. 
Periodic syncing with JVisualvm tool shows

1.  thread is not 100% busy with Ramdisk usage for Commitlog storage and 
scaling improved on large systems. Ramdisk scores > 2 X NVME score.

2.  thread becomes 100% busy with NVME usage for Commiglog and score does 
not improve much beyond 16 cores.

Linux kernel uses 4K pages for mapped memory with mmap system call. So, to 
understand this further, disk I/O testing was done using FIO tool and results 
shows

1.  NVME 4K random R/W throughput is very less with single thread and it 
improves with multi-threaded.

2.  Ramdisk 4K random R/W throughput is good with single thread only and 
also better with multi-threaded

Based on the FIO test results following two ideas were tested for Commitlog 
files with Cassandra-3.1.10 sources.

1.  Enable Direct IO feature for Commitlog files (similar to  
[CASSANDRA-14466] Enable Direct I/O - ASF JIRA 
(apache.org)

Re: CEP-15 multi key transaction syntax

2022-07-22 Thread Caleb Rackliffe
Avi brought up an interesting point around NULLness checking in
CASSANDRA-17762 ...

In SQL, any comparison with NULL is NULL, which is interpreted as FALSE in
> a condition. To test for NULLness, you use IS NULL or IS NOT NULL. But LWT
> uses IF col = NULL as a NULLness test. This is likely to confuse people
> coming from SQL and hamper attempts to extend the dialect.


We can leave that Jira open to address what to do in the legacy LWT case,
but I'd support a SQL-congruent syntax here (IS NULL or IS NOT NULL), where
we have something closer to a blank slate.

Thoughts?

On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky  wrote:

> The new syntax looks great, and I’m really excited to see this coming
> together.
>
> One piece of feedback on the proposed syntax is around the use of “=“ as a
> declaration in addition to its current use as an equality operator in a
> WHERE clause and an assignment operator in an UPDATE:
>
> BEGIN TRANSACTION
>   LET car_miles = miles_driven, car_is_running = is_running FROM cars
> WHERE model=’pinto’
>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>   SELECT something else from some other table
>   IF NOT car_is_running THEN ABORT
>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
> COMMIT TRANSACTION
>
> This is supported in languages like PL/pgSQL, but in a normal SQL query
> kind of local declaration is often expressed as an alias (SELECT col AS
> new_col), subquery alias (SELECT col) t, or common table expression (WITH t
> AS (SELECT col)).
>
> Here’s an example of an alternative to the proposed syntax that I’d find
> more readable:
>
> BEGIN TRANSACTION
>   WITH car_miles, car_is_running AS (SELECT miles_driven, is_running FROM
> cars WHERE model=’pinto’),
>   user_miles AS (SELECT miles_driven FROM users WHERE name=’blake’)
>   IF NOT car_is_running THEN ABORT
>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
> COMMIT TRANSACTION
>
> There’s also the option of naming the transaction like a subquery, and
> supporting LET via AS (this one I’m less sure about but wanted to propose
> anyway):
>
> BEGIN TRANSACTION t1
>   SELECT miles_driven AS t1.car_miles, is_running AS t1.car_is_running
> FROM cars WHERE model=’pinto’;
>   SELECT miles_driven AS t1.user_miles FROM users WHERE name=’blake’;
>   IF NOT car_is_running THEN ABORT
>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
> COMMIT TRANSACTION
>
> This also has the benefit of resolving ambiguity in case of naming
> conflicts with existing (or future) column names.
>
> --
> Abe
>


Grant Read-Only access on Production Cassandra Keyspace

2022-07-22 Thread Bhavesh Prajapati via dev
Hi,

There is a requirement to grant Read-Only access to dev team on Production 
Cassandra Keyspace.

In RDS MySQL, we can leverage Read-Replica so that dev can run queries without 
causing any performance issue on live database.

How can I grant read-only access on Cassandra keyspace and also ensure that it 
does not negatively impact performance to live cluster ?

Thanks,
Bhavesh


Re: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread C. Scott Andreas
Amit, welcome and thank you for contributing the results from your test and opening this discussion.I don’t think anyone is arguing that the database shouldn’t take advantage of available hardware.A few things important to keep in mind when considering a patch like this:- Where the actual bottleneck in the database will be for increased write throughput. As Bowen and Benedict mentioned, the amount of work performed by the commitlog versus the accrued cost of integrating flushed SSTables into the LSM tree is dramatically weighed toward compaction. A multi-day benchmark that allows the database to accrue and incorporate a sizable amount of data is much more likely to produce measurements that approximate what users of Cassandra may experience in production use.- Making something multi-threaded doesn’t reduce the amount of work done; it redistributes it. In a saturated system, this means resources are allocated in an environment of trade offs. Allocating additional resources to the front door will reduce the resources available to compaction, live serving, etc. in environments where cores are not limitless and free. This is why the holistic view of performance others are speaking to is important.- How such a change alters the balance of the database’s threading model, and where the bottleneck moved to. Users who overrun the commitlog’s capability today are likely to be even more negatively impacted by compaction overhead of backpressure is lost at the front door. The meta-point to consider is “how does this change affect the performance characteristics of a live database?”- We also need to balance complexity and correctness in the implementation. If the patch is straightforward, has a well-defined locking scheme, and ideally a suite of randomized tests, that can help mitigate concerns related to this.It sounds like several would welcome such a patch for review. Just want to signpost that the gains and tradeoffs aren’t always clear cut, especially in cases where the improvement is a rebalancing of the database’s threading model rather than reducing the amount of work performed.The second item you mentioned - a direct IO path for commitlog writes - sounds like an interesting potential addition.One thing that may be useful to post along with your patch is a result from an extended tlp-stress run that includes both the live write path as well as the deferred compaction of data written.- ScottOn Jul 22, 2022, at 9:14 AM, Pawar, Amit  wrote:







[AMD Official Use Only - General]
 
Hi Benedict,
 
The whole point is Cassandra as a software should take advantage of hardware wherever possible. So reducing Commitlog bottleneck may help some workloads and not all. I am already working on trunk now and will share the patch. If changes
 looks good and not very complex then please give your feedback. Your input might help to reduce the complexity of change and possibly patch can be accepted.
 
Thanks,
Amit
 


From: Benedict 

Sent: Friday, July 22, 2022 3:56 PM
To: dev@cassandra.apache.org
Cc: Bowen Song ; Raghavendra, Prakash 
Subject: Re: [DISCUSS] Improve Commitlog write path


 
[CAUTION: External Email] 


Hi Amit,


 


I am inclined to agree with Bowen Song, in that benchmarks from an initially empty cluster tend to lean more heavily on memtable and commit log bottlenecks than a real-world long running cluster does, as the algorithmic
 complexity of LSMTs begin to bite much later while the cost of the commit log and memtable stay fairly constant. The more data you have, the less commit log and memtable performance directly matter, and memtable size becomes much more important along
 with compaction efficiency.


 


That said, reducing bottlenecks is still a good thing if the additional complexity is not severe - and this is still an unfortunately common way that
 we benchmark changes today, anyway.

 


 


On 22 Jul 2022, at 11:20, Pawar, Amit  wrote:




 

[Public]
 
Thank you Bowen for your reply. Took some time to respond due to testing issue.
 
I tested again multi-threaded feature with number of records from 260 million to 2 billion and still improvement is seen around 80% of Ramdisk score. It is still possible that compaction can become new bottleneck
 and could be new opportunity to fix it. I am newbie here and possible that I failed to understand your suggestion completely.  At-least with this testing multi-threading benefit is reflecting in score.

 
Do you think multi-threading is good to have now ? else please suggest if I need to test further.
 
Thanks,
Amit
 


From: Bowen Song via dev 

Sent: Wednesday, July 20, 2022 4:13 PM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Improve Commitlog write path


 
[CAUTION: External Email] 

From my past experience, the bottleneck for insert heavy workload is likely to be compaction, not commit log. You initially may see commit log as the bottleneck when the table size is relatively small, but as the tabl

Re: Grant Read-Only access on Production Cassandra Keyspace

2022-07-22 Thread Guang Zhao
Hi Bhavesh,


In order to control the access to cassandra, you need to enable authentication 
and authorization in cassandra. However, both are disabled in a10 cassandra, so 
I don't have much experience on this requirement.


Thanks,

Guang


From: Bhavesh Prajapati via dev 
Sent: Friday, July 22, 2022 9:29:04 AM
To: dev@cassandra.apache.org
Subject: Grant Read-Only access on Production Cassandra Keyspace

Hi,

There is a requirement to grant Read-Only access to dev team on Production 
Cassandra Keyspace.

In RDS MySQL, we can leverage Read-Replica so that dev can run queries without 
causing any performance issue on live database.

How can I grant read-only access on Cassandra keyspace and also ensure that it 
does not negatively impact performance to live cluster ?

Thanks,
Bhavesh


RE: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Pawar, Amit
[AMD Official Use Only - General]

Hi Scott,

Thank you for your reply. I didn’t mean to argue and sorry if it appeared that 
way.

I see that compaction is a complex activity once the data grows too big and 
definitely at some peak point improvement will go away due to some factors on 
some workloads. Will leave it to community to decide about this patch. 
TLP-stress testing is not done and will see how to do it.

Regarding direct IO, it is interesting and need to think how to implement. Java 
11 contains native API support and Java 8 does not. So some JNI calls 
implementation required to enable it for all Java versions. Some blockages are 
there and trying to find out how to overcomes this. Hopefully will succeed in 
implementing the same.

Thanks,
Amit

From: C. Scott Andreas 
Sent: Friday, July 22, 2022 10:24 PM
To: dev@cassandra.apache.org
Cc: Bowen Song ; Raghavendra, Prakash 

Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]
Amit, welcome and thank you for contributing the results from your test and 
opening this discussion.

I don’t think anyone is arguing that the database shouldn’t take advantage of 
available hardware.

A few things important to keep in mind when considering a patch like this:

- Where the actual bottleneck in the database will be for increased write 
throughput. As Bowen and Benedict mentioned, the amount of work performed by 
the commitlog versus the accrued cost of integrating flushed SSTables into the 
LSM tree is dramatically weighed toward compaction. A multi-day benchmark that 
allows the database to accrue and incorporate a sizable amount of data is much 
more likely to produce measurements that approximate what users of Cassandra 
may experience in production use.

- Making something multi-threaded doesn’t reduce the amount of work done; it 
redistributes it. In a saturated system, this means resources are allocated in 
an environment of trade offs. Allocating additional resources to the front door 
will reduce the resources available to compaction, live serving, etc. in 
environments where cores are not limitless and free. This is why the holistic 
view of performance others are speaking to is important.

- How such a change alters the balance of the database’s threading model, and 
where the bottleneck moved to. Users who overrun the commitlog’s capability 
today are likely to be even more negatively impacted by compaction overhead of 
backpressure is lost at the front door. The meta-point to consider is “how does 
this change affect the performance characteristics of a live database?”

- We also need to balance complexity and correctness in the implementation. If 
the patch is straightforward, has a well-defined locking scheme, and ideally a 
suite of randomized tests, that can help mitigate concerns related to this.

It sounds like several would welcome such a patch for review. Just want to 
signpost that the gains and tradeoffs aren’t always clear cut, especially in 
cases where the improvement is a rebalancing of the database’s threading model 
rather than reducing the amount of work performed.

The second item you mentioned - a direct IO path for commitlog writes - sounds 
like an interesting potential addition.

One thing that may be useful to post along with your patch is a result from an 
extended tlp-stress run that includes both the live write path as well as the 
deferred compaction of data written.

- Scott

On Jul 22, 2022, at 9:14 AM, Pawar, Amit 
mailto:amit.pa...@amd.com>> wrote:


[AMD Official Use Only - General]

Hi Benedict,

The whole point is Cassandra as a software should take advantage of hardware 
wherever possible. So reducing Commitlog bottleneck may help some workloads and 
not all. I am already working on trunk now and will share the patch. If changes 
looks good and not very complex then please give your feedback. Your input 
might help to reduce the complexity of change and possibly patch can be 
accepted.

Thanks,
Amit

From: Benedict mailto:bened...@apache.org>>
Sent: Friday, July 22, 2022 3:56 PM
To: dev@cassandra.apache.org
Cc: Bowen Song mailto:bo...@bso.ng>>; Raghavendra, Prakash 
mailto:prakash.raghaven...@amd.com>>
Subject: Re: [DISCUSS] Improve Commitlog write path

[CAUTION: External Email]
Hi Amit,

I am inclined to agree with Bowen Song, in that benchmarks from an initially 
empty cluster tend to lean more heavily on memtable and commit log bottlenecks 
than a real-world long running cluster does, as the algorithmic complexity of 
LSMTs begin to bite much later while the cost of the commit log and memtable 
stay fairly constant. The more data you have, the less commit log and memtable 
performance directly matter, and memtable size becomes much more important 
along with compaction efficiency.

That said, reducing bottlenecks is still a good thing if the additional 
complexity is not severe - and this is still an unfortunately common way that 
we benchmark chang