Re: [DISCUSS] CEP-19: Trie memtable implementation

2022-01-18 Thread Branimir Lambov
The memtable pluggability API (CEP-11) is per-table to enable memtable
selection that suits specific workflows. It also makes full sense to permit
per-node configuration, both to be able to modify the configuration to suit
heterogeneous deployments better, as well as to test changes for
improvements such as this one.
Recognizing this, the patch comes with a modification to the API

that defines memtable templates in cassandra.yaml (i.e. per node) and
allows the schema to select a template (in addition to being able to
specify the full memtable configuration). One could use this e.g. by adding:

memtable_templates:
trie:
class: TrieMemtable
shards: 16
skiplist:
class: SkipListMemtable
memtable:
template: skiplist

(which defines two templates and specifies the default memtable
implementation to use) to cassandra.yaml and specifying  WITH memtable =
{'template' : 'trie'} in the table schema.

I intend to commit this modification with the memtable API
(CASSANDRA-17034/CEP-11).

Performance comparisons will be published soon.

Regards,
Branimir

On Fri, Jan 14, 2022 at 4:15 PM Jeff Jirsa  wrote:

> Sounds like a great addition
>
> Can you share some of the details around gc and latency improvements
> you’ve observed with the list?
>
> Any specific reason the confirmation is through schema vs yaml? Presumably
> it’s so a user can test per table, but this changes every host in a
> cluster, so the impact of a bug/regression is much higher.
>
>
> On Jan 10, 2022, at 1:30 AM, Branimir Lambov  wrote:
>
> 
> We would like to contribute our TrieMemtable to Cassandra.
>
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation
>
> This is a new memtable solution aimed to replace the legacy
> implementation, developed with the following objectives:
> - lowering the on-heap complexity and the ability to store memtable
> indexing structures off-heap,
> - leveraging byte order and a trie structure to lower the memory footprint
> and improve mutation and lookup performance.
>
> The new memtable relies on CASSANDRA-6936 to translate to and from
> byte-ordered representations of types, and CASSANDRA-17034 / CEP-11 to plug
> into Cassandra. The memtable is built on multiple shards of custom
> in-memory single-writer multiple-reader tries, whose implementation uses a
> combination of state-of-the-art and novel features for greater efficiency.
>
> The CEP's JIRA ticket (
> https://issues.apache.org/jira/browse/CASSANDRA-17240) contains the
> initial version of the implementation. In its current form it achieves much
> better garbage collection latency, significantly bigger data sizes between
> flushes for the same memory allocation, as well as drastically increased
> write throughput, and we expect the memory and garbage collection
> improvements to go much further with upcoming improvements to the solution.
>
> I am interested in hearing your thoughts on the proposal.
>
> Regards,
> Branimir
>
>


UDF future

2022-01-18 Thread Ekaterina Dimitrova
Hi everyone,

With the work to add Java 17 support for Cassandra, a new question around
the future of UDF was raised. The scripted UDF was using Nashorn which is
no longer packaged with the JDK. There are options to add new dependencies
to Graal JS for example but it seems people are not sure that it is worth
it. Please check the discussion on CASSANDRA-16895.

The following suggestion was made by Marcus and supported by other PMC
members - "I think we should deprecate scripted UDFs now and drop them from
the next major, but possibly provide hooks for people to write their own
UDF "engines" and break out the current javascript implementation in to its
own repository (but not ship it with Cassandra)."

As a result we decided to engage with our users and created a Twitter
survey. Results below:

*We would love to understand how you use ApacheCassandra UDFs and UDAs.*

*32 people responded as follows:*

   - *We do not use them - 75%*
   - *We only use Java UDFs - 22%*
   - *We only use JS UDFs - 0%*
   - *We use Java and JS UDFs - 3%*

We also received feedback on LinkedIN on the topic -
https://www.linkedin.com/feed/update/urn:li:activity:6886728406742970369?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6886793921020608512%29&replyUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6887421509485248512%29

Thoughts?

Best regards,
Ekaterina


Re: UDF future

2022-01-18 Thread C. Scott Andreas
I also (+1nb) support a proposal to deprecate JavaScript UDFs; to offer an 
interface for those who would like to supply a UDF implementation; and to 
extract/remove our current implementation.

JDK17 support seems like a much higher priority than in-tree JS UDFs.

— Scott

> On Jan 18, 2022, at 8:30 AM, Ekaterina Dimitrova  
> wrote:
> 
> Hi everyone,
> 
> With the work to add Java 17 support for Cassandra, a new question around the 
> future of UDF was raised. The scripted UDF was using Nashorn which is no 
> longer packaged with the JDK. There are options to add new dependencies to 
> Graal JS for example but it seems people are not sure that it is worth it. 
> Please check the discussion on CASSANDRA-16895. 
> 
> The following suggestion was made by Marcus and supported by other PMC 
> members - "I think we should deprecate scripted UDFs now and drop them from 
> the next major, but possibly provide hooks for people to write their own UDF 
> "engines" and break out the current javascript implementation in to its own 
> repository (but not ship it with Cassandra)."
> 
> As a result we decided to engage with our users and created a Twitter survey. 
> Results below:
> We would love to understand how you use ApacheCassandra UDFs and UDAs.
> 32 people responded as follows:
> We do not use them - 75%
> We only use Java UDFs - 22%
> We only use JS UDFs - 0%
> We use Java and JS UDFs - 3%
> We also received feedback on LinkedIN on the topic - 
> https://www.linkedin.com/feed/update/urn:li:activity:6886728406742970369?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6886793921020608512%29&replyUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6887421509485248512%29
> 
> Thoughts?
> 
> Best regards,
> Ekaterina


Re: UDF future

2022-01-18 Thread Jonathan Ellis
+1

On Tue, Jan 18, 2022 at 10:34 AM C. Scott Andreas 
wrote:

> I also (+1nb) support a proposal to deprecate JavaScript UDFs; to offer an
> interface for those who would like to supply a UDF implementation; and to
> extract/remove our current implementation.
>
> JDK17 support seems like a much higher priority than in-tree JS UDFs.
>
> — Scott
>
> On Jan 18, 2022, at 8:30 AM, Ekaterina Dimitrova 
> wrote:
>
> Hi everyone,
>
> With the work to add Java 17 support for Cassandra, a new question around
> the future of UDF was raised. The scripted UDF was using Nashorn which is
> no longer packaged with the JDK. There are options to add new dependencies
> to Graal JS for example but it seems people are not sure that it is worth
> it. Please check the discussion on CASSANDRA-16895.
>
> The following suggestion was made by Marcus and supported by other PMC
> members - "I think we should deprecate scripted UDFs now and drop them
> from the next major, but possibly provide hooks for people to write their
> own UDF "engines" and break out the current javascript implementation in to
> its own repository (but not ship it with Cassandra)."
>
> As a result we decided to engage with our users and created a Twitter
> survey. Results below:
>
> *We would love to understand how you use ApacheCassandra UDFs and UDAs.*
>
> *32 people responded as follows:*
>
>- *We do not use them - 75%*
>- *We only use Java UDFs - 22%*
>- *We only use JS UDFs - 0%*
>- *We use Java and JS UDFs - 3%*
>
> We also received feedback on LinkedIN on the topic -
> https://www.linkedin.com/feed/update/urn:li:activity:6886728406742970369?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6886793921020608512%29&replyUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6887421509485248512%29
>
> Thoughts?
>
> Best regards,
> Ekaterina
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: UDF future

2022-01-18 Thread Jeff Jirsa

+1

> On Jan 18, 2022, at 8:38 AM, Jonathan Ellis  wrote:
> 
> 
> +1
> 
>> On Tue, Jan 18, 2022 at 10:34 AM C. Scott Andreas  
>> wrote:
>> I also (+1nb) support a proposal to deprecate JavaScript UDFs; to offer an 
>> interface for those who would like to supply a UDF implementation; and to 
>> extract/remove our current implementation.
>> 
>> JDK17 support seems like a much higher priority than in-tree JS UDFs.
>> 
>> — Scott
>> 
 On Jan 18, 2022, at 8:30 AM, Ekaterina Dimitrova  
 wrote:
 
>>> Hi everyone,
>>> 
>>> With the work to add Java 17 support for Cassandra, a new question around 
>>> the future of UDF was raised. The scripted UDF was using Nashorn which is 
>>> no longer packaged with the JDK. There are options to add new dependencies 
>>> to Graal JS for example but it seems people are not sure that it is worth 
>>> it. Please check the discussion on CASSANDRA-16895. 
>>> 
>>> The following suggestion was made by Marcus and supported by other PMC 
>>> members - "I think we should deprecate scripted UDFs now and drop them from 
>>> the next major, but possibly provide hooks for people to write their own 
>>> UDF "engines" and break out the current javascript implementation in to its 
>>> own repository (but not ship it with Cassandra)."
>>> 
>>> As a result we decided to engage with our users and created a Twitter 
>>> survey. Results below:
>>> We would love to understand how you use ApacheCassandra UDFs and UDAs.
>>> 32 people responded as follows:
>>> We do not use them - 75%
>>> We only use Java UDFs - 22%
>>> We only use JS UDFs - 0%
>>> We use Java and JS UDFs - 3%
>>> We also received feedback on LinkedIN on the topic - 
>>> https://www.linkedin.com/feed/update/urn:li:activity:6886728406742970369?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6886793921020608512%29&replyUrn=urn%3Ali%3Acomment%3A%28activity%3A6886728406742970369%2C6887421509485248512%29
>>> 
>>> Thoughts?
>>> 
>>> Best regards,
>>> Ekaterina
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced