Re: naming development branches consistently

2020-08-26 Thread Benjamin Lerer
+1 for trunk or main (slight preference for trunk)



On Tue, Aug 25, 2020 at 8:52 PM Mick Semb Wever  wrote:

> +1 for trunk and main.
>
> Thanks for raising this Brandon.
>
>
>
> On Tue, 25 Aug 2020 at 20:40, Cyril Scetbon  wrote:
>
> > Scott, I don’t think it does and don’t see any offense in what I’ve said.
> > Can you be more specific instead of sending a link with rules numbers
> that
> > I don’t think apply ? I can also send links to definitions of all the
> words
> > in the rules you sent but it could be a waste of time. We can continue
> that
> > discussion on the list or in private up to you.
> >
> > Cyril Scetbon
> >
> > > On Aug 25, 2020, at 11:50 AM, Scott Hirleman  >
> > wrote:
> > >
> > > Cyril, your commentary violates the code of conduct of the ASF
> > >  re rules #2 and
> > #5.
> > > It could also be very hurtful to those who feel this is important, as
> > many
> > > in the community do. Please think before pushing further on this as
> > > projects need to be welcoming and inclusive. Thank you.
> > >
> > >> On Mon, Aug 24, 2020 at 4:00 PM Cyril Scetbon 
> > wrote:
> > >>
> > >> Wow, thanks for the link Almero. I suppose the dictionary comes next
> > then
> > >> 🤷‍♂️
> > >>
> > >>> On Aug 24, 2020, at 6:54 PM, Gouws, Almero
>  > >
> > >> wrote:
> > >>>
> > >>>
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zdnet.com_article_mysql-2Ddrops-2Dmaster-2Dslave-2Dand-2Dblacklist-2Dwhitelist-2Dterminology_&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=42Z7FyMoAS1DbvgKNjU8zxi7xTPVAGalPzk7bfmRVgw&m=_KDgXvABczXFLLkrzUHvgLTiGM6FTGPWWBXafxw3QNM&s=nV5LVK6p96fblFesy94FxPvsMgwZbtgwNyXh5wM7yXE&e=
> > >>>
> > >>> -Almero
> > >>>
> > >>> -Original Message-
> > >>> From: Cyril Scetbon 
> > >>> Sent: Monday, August 24, 2020 3:47 PM
> > >>> To: dev@cassandra.apache.org
> > >>> Subject: RE: [EXTERNAL] naming development branches consistently
> > >>>
> > >>> CAUTION: This email originated from outside of the organization. Do
> not
> > >> click links or open attachments unless you can confirm the sender and
> > know
> > >> the content is safe.
> > >>>
> > >>>
> > >>>
> > >>> Seriously ? Should we change how MySQL architectures are defined ?
> > >> Should we remove it from the dictionary too ? Just to see how radical
> It
> > >> could be … 🤦‍♂️
> > >>>
> >  On Aug 24, 2020, at 12:12 PM, Brandon Williams 
> > >> wrote:
> > 
> >  With the current social climate I thought removing the master
> >  reference rather than proliferating it would be better.
> > 
> >  On Mon, Aug 24, 2020 at 11:07 AM Joshua McKenzie <
> > jmcken...@apache.org>
> > >> wrote:
> > >
> > > Why not rename "trunk" to "master" in C*?   =D
> > >
> > > On Mon, Aug 24, 2020 at 11:17 AM Brandon Williams <
> dri...@gmail.com>
> > >> wrote:
> > >
> > >> Hello,
> > >>
> > >> Currently in the cassandra repo our development branch is named
> > >> 'trunk', but this is not consistently used in other repos, such as
> > >> cassandra-dtest, -builds, -website, and probably others, which use
> > >> 'master' instead.  I propose we rename all of those to 'trunk' to
> > >> match.
> > >>
> > >> Kind Regards,
> > >> Brandon
> > >>
> >
>


Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-26 Thread Patrick McFadin
This is related to the discussion Jordan and I had about the contributor
Zoom call. Instead of open mic for any issue, call it based on a discussion
thread or threads for higher bandwidth discussion.

I would be happy to schedule on for next week to specifically discuss
CEP-7. I can attach the recorded call to the CEP after.

+1 or -1?

Patrick

On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie 
wrote:

> >
> > Does community plan to open another discussion or CEP on modularization?
>
> We probably should have a discussion on the ML or monthly contrib call
> about it first to see how aligned the interested contributors are. Could do
> that through CEP as well but CEP's (at least thus far sans k8s operator)
> tend to start with a strong, deeply thought out point of view being
> expressed.
>
> On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang <
> jasonstack.z...@gmail.com> wrote:
>
> > >>> SASI's performance, specifically the search in the B+ tree component,
> > >>> depends a lot on the component file's header being available in the
> > >>> pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI
> > bound
> > >>> to this same or similar limitation?
> >
> > SAI also benefits from larger memory because SAI puts block info on heap
> > for searching on-disk components and having cross-index files on page
> cache
> > improves read performance of different indexes on the same table.
> >
> >
> > >>> Flushing of SASI can be CPU+IO intensive, to the point of saturation,
> > >>> pauses, and crashes on the node. SSDs are a must, along with a bit of
> > >>> tuning, just to avoid bringing down your cluster. Beyond reducing
> space
> > >>> requirements, does SAI improve on these things? Like SASI how does
> SAI,
> > in
> > >>> its own way, change/narrow the recommendations on node hardware
> specs?
> >
> > SAI won't crash the node during compaction and requires less CPU/IO.
> >
> > * SAI defines global memory limit for compaction instead of per-index
> > memory limit used by SASI.
> >   For example, compactions are running on 10 tables and each has 10
> > indexes. SAI will cap the
> >   memory usage with global limit while SASI may use up to 100 * per-index
> > limit.
> >
> > * After flushing in-memory segments to disk, SAI won't merge on-disk
> > segments while SASI
> >   attempts to merge them at the end.
> >
> >   There are pros and cons of not merging segments:
> > ** Pros: compaction runs faster and requires fewer resources.
> > ** Cons: small segments reduce compression ratio.
> >
> > * SAI on-disk format with row ids compresses better.
> >
> >
> > >>> I understand the desire in keeping out of scope the longer term
> > deprecation
> > >>> and migration plan, but… if SASI provides functionality that SAI
> > doesn't,
> > >>> like tokenisation and DelimiterAnalyzer, yet introduces a body of
> code
> > >>> ~somewhat similar, shouldn't we be roughly sketching out how to
> reduce
> > the
> > >>> maintenance surface area?
> >
> > Agreed that we should reduce maintenance area if possible, but only very
> > limited
> > code base (eg. RangeIterator, QueryPlan) can be shared. The rest of the
> > code base
> > is quite different because of on-disk format and cross-index files.
> >
> > The goal of this CEP is to get community buy-in on SAI's design.
> > Tokenization,
> > DelimiterAnalyzer should be straightforward to implement on top of SAI.
> >
> > >>> Can we list what configurations of SASI will become deprecated once
> SAI
> > >>> becomes non-experimental?
> >
> > Except for "Like", "Tokenisation", "DelimiterAnalyzer", the rest of SASI
> > can
> > be replaced by SAI.
> >
> > >>> Given a few bugs are open against 2i and SASI, can we provide some
> > >>> overview, or rough indication, of how many of them we could "triage
> > away"?
> >
> > I believe most of the known bugs in 2i/SASI either have been addressed in
> > SAI or
> > don't apply to SAI.
> >
> > >>> And, is it time for the project to start introducing new SPI
> > >>> implementations as separate sub-modules and jar files that are only
> > loaded
> > >>> at runtime based on configuration settings? (sorry for the conflation
> > on
> > >>> this one, but maybe it's the right time to raise it :shrug:)
> >
> > Agreed that modularization is the way to go and will speed up module
> > development speed.
> >
> > Does community plan to open another discussion or CEP on modularization?
> >
> >
> > On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever  wrote:
> >
> > > Adding to Duy's questions…
> > >
> > >
> > > * Hardware specs
> > >
> > > SASI's performance, specifically the search in the B+ tree component,
> > > depends a lot on the component file's header being available in the
> > > pagecache. SASI benefits from (needs) nodes with lots of RAM. Is SAI
> > bound
> > > to this same or similar limitation?
> > >
> > > Flushing of SASI can be CPU+IO intensive, to the point of saturation,
> > > pauses, and crashes on the node. SSDs are a must, along with a bit of
> > > 

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-26 Thread Caleb Rackliffe
+1

On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin  wrote:

> This is related to the discussion Jordan and I had about the contributor
> Zoom call. Instead of open mic for any issue, call it based on a discussion
> thread or threads for higher bandwidth discussion.
>
> I would be happy to schedule on for next week to specifically discuss
> CEP-7. I can attach the recorded call to the CEP after.
>
> +1 or -1?
>
> Patrick
>
> On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie 
> wrote:
>
> > >
> > > Does community plan to open another discussion or CEP on
> modularization?
> >
> > We probably should have a discussion on the ML or monthly contrib call
> > about it first to see how aligned the interested contributors are. Could
> do
> > that through CEP as well but CEP's (at least thus far sans k8s operator)
> > tend to start with a strong, deeply thought out point of view being
> > expressed.
> >
> > On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang <
> > jasonstack.z...@gmail.com> wrote:
> >
> > > >>> SASI's performance, specifically the search in the B+ tree
> component,
> > > >>> depends a lot on the component file's header being available in the
> > > >>> pagecache. SASI benefits from (needs) nodes with lots of RAM. Is
> SAI
> > > bound
> > > >>> to this same or similar limitation?
> > >
> > > SAI also benefits from larger memory because SAI puts block info on
> heap
> > > for searching on-disk components and having cross-index files on page
> > cache
> > > improves read performance of different indexes on the same table.
> > >
> > >
> > > >>> Flushing of SASI can be CPU+IO intensive, to the point of
> saturation,
> > > >>> pauses, and crashes on the node. SSDs are a must, along with a bit
> of
> > > >>> tuning, just to avoid bringing down your cluster. Beyond reducing
> > space
> > > >>> requirements, does SAI improve on these things? Like SASI how does
> > SAI,
> > > in
> > > >>> its own way, change/narrow the recommendations on node hardware
> > specs?
> > >
> > > SAI won't crash the node during compaction and requires less CPU/IO.
> > >
> > > * SAI defines global memory limit for compaction instead of per-index
> > > memory limit used by SASI.
> > >   For example, compactions are running on 10 tables and each has 10
> > > indexes. SAI will cap the
> > >   memory usage with global limit while SASI may use up to 100 *
> per-index
> > > limit.
> > >
> > > * After flushing in-memory segments to disk, SAI won't merge on-disk
> > > segments while SASI
> > >   attempts to merge them at the end.
> > >
> > >   There are pros and cons of not merging segments:
> > > ** Pros: compaction runs faster and requires fewer resources.
> > > ** Cons: small segments reduce compression ratio.
> > >
> > > * SAI on-disk format with row ids compresses better.
> > >
> > >
> > > >>> I understand the desire in keeping out of scope the longer term
> > > deprecation
> > > >>> and migration plan, but… if SASI provides functionality that SAI
> > > doesn't,
> > > >>> like tokenisation and DelimiterAnalyzer, yet introduces a body of
> > code
> > > >>> ~somewhat similar, shouldn't we be roughly sketching out how to
> > reduce
> > > the
> > > >>> maintenance surface area?
> > >
> > > Agreed that we should reduce maintenance area if possible, but only
> very
> > > limited
> > > code base (eg. RangeIterator, QueryPlan) can be shared. The rest of the
> > > code base
> > > is quite different because of on-disk format and cross-index files.
> > >
> > > The goal of this CEP is to get community buy-in on SAI's design.
> > > Tokenization,
> > > DelimiterAnalyzer should be straightforward to implement on top of SAI.
> > >
> > > >>> Can we list what configurations of SASI will become deprecated once
> > SAI
> > > >>> becomes non-experimental?
> > >
> > > Except for "Like", "Tokenisation", "DelimiterAnalyzer", the rest of
> SASI
> > > can
> > > be replaced by SAI.
> > >
> > > >>> Given a few bugs are open against 2i and SASI, can we provide some
> > > >>> overview, or rough indication, of how many of them we could "triage
> > > away"?
> > >
> > > I believe most of the known bugs in 2i/SASI either have been addressed
> in
> > > SAI or
> > > don't apply to SAI.
> > >
> > > >>> And, is it time for the project to start introducing new SPI
> > > >>> implementations as separate sub-modules and jar files that are only
> > > loaded
> > > >>> at runtime based on configuration settings? (sorry for the
> conflation
> > > on
> > > >>> this one, but maybe it's the right time to raise it :shrug:)
> > >
> > > Agreed that modularization is the way to go and will speed up module
> > > development speed.
> > >
> > > Does community plan to open another discussion or CEP on
> modularization?
> > >
> > >
> > > On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever  wrote:
> > >
> > > > Adding to Duy's questions…
> > > >
> > > >
> > > > * Hardware specs
> > > >
> > > > SASI's performance, specifically the search in the B+ tree component,
> > > > depends a lot on the comp

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-26 Thread Ekaterina Dimitrova
+1

On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe 
wrote:

> +1
>
>
>
> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin  wrote:
>
>
>
> > This is related to the discussion Jordan and I had about the contributor
>
> > Zoom call. Instead of open mic for any issue, call it based on a
> discussion
>
> > thread or threads for higher bandwidth discussion.
>
> >
>
> > I would be happy to schedule on for next week to specifically discuss
>
> > CEP-7. I can attach the recorded call to the CEP after.
>
> >
>
> > +1 or -1?
>
> >
>
> > Patrick
>
> >
>
> > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie 
>
> > wrote:
>
> >
>
> > > >
>
> > > > Does community plan to open another discussion or CEP on
>
> > modularization?
>
> > >
>
> > > We probably should have a discussion on the ML or monthly contrib call
>
> > > about it first to see how aligned the interested contributors are.
> Could
>
> > do
>
> > > that through CEP as well but CEP's (at least thus far sans k8s
> operator)
>
> > > tend to start with a strong, deeply thought out point of view being
>
> > > expressed.
>
> > >
>
> > > On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang <
>
> > > jasonstack.z...@gmail.com> wrote:
>
> > >
>
> > > > >>> SASI's performance, specifically the search in the B+ tree
>
> > component,
>
> > > > >>> depends a lot on the component file's header being available in
> the
>
> > > > >>> pagecache. SASI benefits from (needs) nodes with lots of RAM. Is
>
> > SAI
>
> > > > bound
>
> > > > >>> to this same or similar limitation?
>
> > > >
>
> > > > SAI also benefits from larger memory because SAI puts block info on
>
> > heap
>
> > > > for searching on-disk components and having cross-index files on page
>
> > > cache
>
> > > > improves read performance of different indexes on the same table.
>
> > > >
>
> > > >
>
> > > > >>> Flushing of SASI can be CPU+IO intensive, to the point of
>
> > saturation,
>
> > > > >>> pauses, and crashes on the node. SSDs are a must, along with a
> bit
>
> > of
>
> > > > >>> tuning, just to avoid bringing down your cluster. Beyond reducing
>
> > > space
>
> > > > >>> requirements, does SAI improve on these things? Like SASI how
> does
>
> > > SAI,
>
> > > > in
>
> > > > >>> its own way, change/narrow the recommendations on node hardware
>
> > > specs?
>
> > > >
>
> > > > SAI won't crash the node during compaction and requires less CPU/IO.
>
> > > >
>
> > > > * SAI defines global memory limit for compaction instead of per-index
>
> > > > memory limit used by SASI.
>
> > > >   For example, compactions are running on 10 tables and each has 10
>
> > > > indexes. SAI will cap the
>
> > > >   memory usage with global limit while SASI may use up to 100 *
>
> > per-index
>
> > > > limit.
>
> > > >
>
> > > > * After flushing in-memory segments to disk, SAI won't merge on-disk
>
> > > > segments while SASI
>
> > > >   attempts to merge them at the end.
>
> > > >
>
> > > >   There are pros and cons of not merging segments:
>
> > > > ** Pros: compaction runs faster and requires fewer resources.
>
> > > > ** Cons: small segments reduce compression ratio.
>
> > > >
>
> > > > * SAI on-disk format with row ids compresses better.
>
> > > >
>
> > > >
>
> > > > >>> I understand the desire in keeping out of scope the longer term
>
> > > > deprecation
>
> > > > >>> and migration plan, but… if SASI provides functionality that SAI
>
> > > > doesn't,
>
> > > > >>> like tokenisation and DelimiterAnalyzer, yet introduces a body of
>
> > > code
>
> > > > >>> ~somewhat similar, shouldn't we be roughly sketching out how to
>
> > > reduce
>
> > > > the
>
> > > > >>> maintenance surface area?
>
> > > >
>
> > > > Agreed that we should reduce maintenance area if possible, but only
>
> > very
>
> > > > limited
>
> > > > code base (eg. RangeIterator, QueryPlan) can be shared. The rest of
> the
>
> > > > code base
>
> > > > is quite different because of on-disk format and cross-index files.
>
> > > >
>
> > > > The goal of this CEP is to get community buy-in on SAI's design.
>
> > > > Tokenization,
>
> > > > DelimiterAnalyzer should be straightforward to implement on top of
> SAI.
>
> > > >
>
> > > > >>> Can we list what configurations of SASI will become deprecated
> once
>
> > > SAI
>
> > > > >>> becomes non-experimental?
>
> > > >
>
> > > > Except for "Like", "Tokenisation", "DelimiterAnalyzer", the rest of
>
> > SASI
>
> > > > can
>
> > > > be replaced by SAI.
>
> > > >
>
> > > > >>> Given a few bugs are open against 2i and SASI, can we provide
> some
>
> > > > >>> overview, or rough indication, of how many of them we could
> "triage
>
> > > > away"?
>
> > > >
>
> > > > I believe most of the known bugs in 2i/SASI either have been
> addressed
>
> > in
>
> > > > SAI or
>
> > > > don't apply to SAI.
>
> > > >
>
> > > > >>> And, is it time for the project to start introducing new SPI
>
> > > > >>> implementations as separate sub-modules and jar files that are
> only
>
> > > > loaded
>
> > > > >>> at runtime based on configuration setting