> Short question: looking forward, how are we going to maintain three 2i > implementations: SASI, SAI, and 2i?
I think one of the goals stated in the CEP is for SAI to have parity with 2i such that it could eventually replace it. > On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <oleksandr.pet...@gmail.com> > wrote: > > Short question: looking forward, how are we going to maintain three 2i > implementations: SASI, SAI, and 2i? > > Another thing I think this CEP is missing is rationale and motivation > about why trie-based indexes were chosen over, say, B-Tree. We did have a > short discussion about this on Slack, but both arguments that I've heard > (space-saving and keeping a small subset of nodes in memory) work only for > the most primitive implementation of a B-Tree. Fully-occupied prefix B-Tree > can have similar properties. There's been a lot of research on B-Trees and > optimisations in those. Unfortunately, I do not have an > implementation sitting around for a direct comparison, but I can imagine > situations when B-Trees may perform better because of simpler construction. > Maybe we should even consider prototyping a prefix B-Tree to have a more > fair comparison. > > Thank you, > -- Alex > > > > On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang < > jasonstack.z...@gmail.com> wrote: > >> Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7 SAI. >> >> The recorded video is available here: >> >> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-01+Apache+Cassandra+Contributor+Meeting >> >> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang < >> jasonstack.z...@gmail.com> >> wrote: >> >>> Thank you, Charles and Patrick >>> >>> On Tue, 1 Sep 2020 at 04:56, Charles Cao <caohair...@gmail.com> wrote: >>> >>>> Thank you, Patrick! >>>> >>>> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin <pmcfa...@gmail.com> >>>> wrote: >>>>> >>>>> I just moved it to 8AM for this meeting to better accommodate APAC. >>>> Please >>>>> see the update here: >>>>> >>>> >> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting >>>>> >>>>> Patrick >>>>> >>>>> On Mon, Aug 31, 2020 at 10:04 AM Charles Cao <caohair...@gmail.com> >>>> wrote: >>>>> >>>>>> Patrick, >>>>>> >>>>>> 11AM PST is a bad time for the people in the APAC timezone. Can we >>>>>> move it to 7 or 8AM PST in the morning to accommodate their needs ? >>>>>> >>>>>> ~Charles >>>>>> >>>>>> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin <pmcfa...@gmail.com >>> >>>>>> wrote: >>>>>>> >>>>>>> Meeting scheduled. >>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting >>>>>>> >>>>>>> Tuesday September 1st, 11AM PST. I added a basic bullet for the >>>> agenda >>>>>> but >>>>>>> if there is more, edit away. >>>>>>> >>>>>>> Patrick >>>>>>> >>>>>>> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang < >>>>>>> jasonstack.z...@gmail.com> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova < >>>>>> e.dimitr...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe < >>>>>> calebrackli...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin < >>>> pmcfa...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> This is related to the discussion Jordan and I had about >> the >>>>>>>>> contributor >>>>>>>>>> >>>>>>>>>>> Zoom call. Instead of open mic for any issue, call it >> based >>>> on a >>>>>>>>>> discussion >>>>>>>>>> >>>>>>>>>>> thread or threads for higher bandwidth discussion. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I would be happy to schedule on for next week to >>>> specifically >>>>>> discuss >>>>>>>>>> >>>>>>>>>>> CEP-7. I can attach the recorded call to the CEP after. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> +1 or -1? >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Patrick >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie < >>>>>>>> jmcken...@apache.org> >>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Does community plan to open another discussion or CEP >> on >>>>>>>>>> >>>>>>>>>>> modularization? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> We probably should have a discussion on the ML or >> monthly >>>>>> contrib >>>>>>>>> call >>>>>>>>>> >>>>>>>>>>>> about it first to see how aligned the interested >>>> contributors >>>>>> are. >>>>>>>>>> Could >>>>>>>>>> >>>>>>>>>>> do >>>>>>>>>> >>>>>>>>>>>> that through CEP as well but CEP's (at least thus far >>>> sans k8s >>>>>>>>>> operator) >>>>>>>>>> >>>>>>>>>>>> tend to start with a strong, deeply thought out point of >>>> view >>>>>> being >>>>>>>>>> >>>>>>>>>>>> expressed. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang < >>>>>>>>>> >>>>>>>>>>>> jasonstack.z...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> SASI's performance, specifically the search in the >>>> B+ >>>>>> tree >>>>>>>>>> >>>>>>>>>>> component, >>>>>>>>>> >>>>>>>>>>>>>>>> depends a lot on the component file's header being >>>>>> available >>>>>>>> in >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>>>> pagecache. SASI benefits from (needs) nodes with >>>> lots of >>>>>> RAM. >>>>>>>>> Is >>>>>>>>>> >>>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>>>>> bound >>>>>>>>>> >>>>>>>>>>>>>>>> to this same or similar limitation? >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> SAI also benefits from larger memory because SAI puts >>>> block >>>>>> info >>>>>>>> on >>>>>>>>>> >>>>>>>>>>> heap >>>>>>>>>> >>>>>>>>>>>>> for searching on-disk components and having >> cross-index >>>>>> files on >>>>>>>>> page >>>>>>>>>> >>>>>>>>>>>> cache >>>>>>>>>> >>>>>>>>>>>>> improves read performance of different indexes on the >>>> same >>>>>> table. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> Flushing of SASI can be CPU+IO intensive, to the >>>> point of >>>>>>>>>> >>>>>>>>>>> saturation, >>>>>>>>>> >>>>>>>>>>>>>>>> pauses, and crashes on the node. SSDs are a must, >>>> along >>>>>> with >>>>>>>> a >>>>>>>>>> bit >>>>>>>>>> >>>>>>>>>>> of >>>>>>>>>> >>>>>>>>>>>>>>>> tuning, just to avoid bringing down your cluster. >>>> Beyond >>>>>>>>> reducing >>>>>>>>>> >>>>>>>>>>>> space >>>>>>>>>> >>>>>>>>>>>>>>>> requirements, does SAI improve on these things? >> Like >>>>>> SASI how >>>>>>>>>> does >>>>>>>>>> >>>>>>>>>>>> SAI, >>>>>>>>>> >>>>>>>>>>>>> in >>>>>>>>>> >>>>>>>>>>>>>>>> its own way, change/narrow the recommendations on >>>> node >>>>>>>> hardware >>>>>>>>>> >>>>>>>>>>>> specs? >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> SAI won't crash the node during compaction and >> requires >>>> less >>>>>>>>> CPU/IO. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> * SAI defines global memory limit for compaction >>>> instead of >>>>>>>>> per-index >>>>>>>>>> >>>>>>>>>>>>> memory limit used by SASI. >>>>>>>>>> >>>>>>>>>>>>> For example, compactions are running on 10 tables >> and >>>> each >>>>>> has >>>>>>>> 10 >>>>>>>>>> >>>>>>>>>>>>> indexes. SAI will cap the >>>>>>>>>> >>>>>>>>>>>>> memory usage with global limit while SASI may use up >>>> to >>>>>> 100 * >>>>>>>>>> >>>>>>>>>>> per-index >>>>>>>>>> >>>>>>>>>>>>> limit. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> * After flushing in-memory segments to disk, SAI won't >>>> merge >>>>>>>>> on-disk >>>>>>>>>> >>>>>>>>>>>>> segments while SASI >>>>>>>>>> >>>>>>>>>>>>> attempts to merge them at the end. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> There are pros and cons of not merging segments: >>>>>>>>>> >>>>>>>>>>>>> ** Pros: compaction runs faster and requires fewer >>>>>> resources. >>>>>>>>>> >>>>>>>>>>>>> ** Cons: small segments reduce compression ratio. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> * SAI on-disk format with row ids compresses better. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> I understand the desire in keeping out of scope >> the >>>>>> longer >>>>>>>> term >>>>>>>>>> >>>>>>>>>>>>> deprecation >>>>>>>>>> >>>>>>>>>>>>>>>> and migration plan, but… if SASI provides >>>> functionality >>>>>> that >>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>>>>> doesn't, >>>>>>>>>> >>>>>>>>>>>>>>>> like tokenisation and DelimiterAnalyzer, yet >>>> introduces a >>>>>>>> body >>>>>>>>> of >>>>>>>>>> >>>>>>>>>>>> code >>>>>>>>>> >>>>>>>>>>>>>>>> ~somewhat similar, shouldn't we be roughly >>>> sketching out >>>>>> how >>>>>>>> to >>>>>>>>>> >>>>>>>>>>>> reduce >>>>>>>>>> >>>>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>>>> maintenance surface area? >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Agreed that we should reduce maintenance area if >>>> possible, >>>>>> but >>>>>>>> only >>>>>>>>>> >>>>>>>>>>> very >>>>>>>>>> >>>>>>>>>>>>> limited >>>>>>>>>> >>>>>>>>>>>>> code base (eg. RangeIterator, QueryPlan) can be >> shared. >>>> The >>>>>> rest >>>>>>>> of >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>> code base >>>>>>>>>> >>>>>>>>>>>>> is quite different because of on-disk format and >>>> cross-index >>>>>>>> files. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> The goal of this CEP is to get community buy-in on >> SAI's >>>>>> design. >>>>>>>>>> >>>>>>>>>>>>> Tokenization, >>>>>>>>>> >>>>>>>>>>>>> DelimiterAnalyzer should be straightforward to >>>> implement on >>>>>> top >>>>>>>> of >>>>>>>>>> SAI. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> Can we list what configurations of SASI will >> become >>>>>>>> deprecated >>>>>>>>>> once >>>>>>>>>> >>>>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>>>>>>>> becomes non-experimental? >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Except for "Like", "Tokenisation", >> "DelimiterAnalyzer", >>>> the >>>>>> rest >>>>>>>> of >>>>>>>>>> >>>>>>>>>>> SASI >>>>>>>>>> >>>>>>>>>>>>> can >>>>>>>>>> >>>>>>>>>>>>> be replaced by SAI. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> Given a few bugs are open against 2i and SASI, can >>>> we >>>>>> provide >>>>>>>>>> some >>>>>>>>>> >>>>>>>>>>>>>>>> overview, or rough indication, of how many of them >>>> we >>>>>> could >>>>>>>>>> "triage >>>>>>>>>> >>>>>>>>>>>>> away"? >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> I believe most of the known bugs in 2i/SASI either >> have >>>> been >>>>>>>>>> addressed >>>>>>>>>> >>>>>>>>>>> in >>>>>>>>>> >>>>>>>>>>>>> SAI or >>>>>>>>>> >>>>>>>>>>>>> don't apply to SAI. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> And, is it time for the project to start >>>> introducing new >>>>>> SPI >>>>>>>>>> >>>>>>>>>>>>>>>> implementations as separate sub-modules and jar >>>> files >>>>>> that >>>>>>>> are >>>>>>>>>> only >>>>>>>>>> >>>>>>>>>>>>> loaded >>>>>>>>>> >>>>>>>>>>>>>>>> at runtime based on configuration settings? (sorry >>>> for >>>>>> the >>>>>>>>>> >>>>>>>>>>> conflation >>>>>>>>>> >>>>>>>>>>>>> on >>>>>>>>>> >>>>>>>>>>>>>>>> this one, but maybe it's the right time to raise >> it >>>>>> :shrug:) >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Agreed that modularization is the way to go and will >>>> speed up >>>>>>>>> module >>>>>>>>>> >>>>>>>>>>>>> development speed. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Does community plan to open another discussion or CEP >> on >>>>>>>>>> >>>>>>>>>>> modularization? >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever < >>>>>> m...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Adding to Duy's questions… >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> * Hardware specs >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> SASI's performance, specifically the search in the >> B+ >>>> tree >>>>>>>>>> component, >>>>>>>>>> >>>>>>>>>>>>>> depends a lot on the component file's header being >>>>>> available in >>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>> pagecache. SASI benefits from (needs) nodes with >> lots >>>> of >>>>>> RAM. >>>>>>>> Is >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>>>>> bound >>>>>>>>>> >>>>>>>>>>>>>> to this same or similar limitation? >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Flushing of SASI can be CPU+IO intensive, to the >>>> point of >>>>>>>>>> saturation, >>>>>>>>>> >>>>>>>>>>>>>> pauses, and crashes on the node. SSDs are a must, >>>> along >>>>>> with a >>>>>>>>> bit >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>>>>>> tuning, just to avoid bringing down your cluster. >>>> Beyond >>>>>>>> reducing >>>>>>>>>> >>>>>>>>>>> space >>>>>>>>>> >>>>>>>>>>>>>> requirements, does SAI improve on these things? Like >>>> SASI >>>>>> how >>>>>>>>> does >>>>>>>>>> >>>>>>>>>>> SAI, >>>>>>>>>> >>>>>>>>>>>>> in >>>>>>>>>> >>>>>>>>>>>>>> its own way, change/narrow the recommendations on >> node >>>>>> hardware >>>>>>>>>> >>>>>>>>>>> specs? >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> * Code Maintenance >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> I understand the desire in keeping out of scope the >>>> longer >>>>>> term >>>>>>>>>> >>>>>>>>>>>>> deprecation >>>>>>>>>> >>>>>>>>>>>>>> and migration plan, but… if SASI provides >>>> functionality >>>>>> that >>>>>>>> SAI >>>>>>>>>> >>>>>>>>>>>> doesn't, >>>>>>>>>> >>>>>>>>>>>>>> like tokenisation and DelimiterAnalyzer, yet >>>> introduces a >>>>>> body >>>>>>>> of >>>>>>>>>> >>>>>>>>>>> code >>>>>>>>>> >>>>>>>>>>>>>> ~somewhat similar, shouldn't we be roughly sketching >>>> out >>>>>> how to >>>>>>>>>> >>>>>>>>>>> reduce >>>>>>>>>> >>>>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>> maintenance surface area? >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Can we list what configurations of SASI will become >>>>>> deprecated >>>>>>>>> once >>>>>>>>>> >>>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>>>>>> becomes non-experimental? >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Given a few bugs are open against 2i and SASI, can >> we >>>>>> provide >>>>>>>>> some >>>>>>>>>> >>>>>>>>>>>>>> overview, or rough indication, of how many of them >> we >>>> could >>>>>>>>> "triage >>>>>>>>>> >>>>>>>>>>>>> away"? >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> And, is it time for the project to start introducing >>>> new >>>>>> SPI >>>>>>>>>> >>>>>>>>>>>>>> implementations as separate sub-modules and jar >> files >>>> that >>>>>> are >>>>>>>>> only >>>>>>>>>> >>>>>>>>>>>>> loaded >>>>>>>>>> >>>>>>>>>>>>>> at runtime based on configuration settings? (sorry >>>> for the >>>>>>>>>> conflation >>>>>>>>>> >>>>>>>>>>>> on >>>>>>>>>> >>>>>>>>>>>>>> this one, but maybe it's the right time to raise it >>>>>> :shrug:) >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> regards, >>>>>>>>>> >>>>>>>>>>>>>> Mick >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> On Tue, 18 Aug 2020 at 13:05, DuyHai Doan < >>>>>>>> doanduy...@gmail.com> >>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> Thank you Zhao Yang for starting this topic >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> After reading the short design doc, I have a few >>>>>> questions >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> 1) SASI was pretty inefficient indexing wide >>>> partitions >>>>>>>> because >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>> index >>>>>>>>>> >>>>>>>>>>>>>>> structure only retains the partition token, not >> the >>>>>>>> clustering >>>>>>>>>> >>>>>>>>>>>> colums. >>>>>>>>>> >>>>>>>>>>>>> As >>>>>>>>>> >>>>>>>>>>>>>>> per design doc SAI has row id mapping to partition >>>>>> offset, >>>>>>>> can >>>>>>>>> we >>>>>>>>>> >>>>>>>>>>>> hope >>>>>>>>>> >>>>>>>>>>>>>> that >>>>>>>>>> >>>>>>>>>>>>>>> indexing wide partition will be more efficient >> with >>>> SAI >>>>>> ? One >>>>>>>>>> >>>>>>>>>>> detail >>>>>>>>>> >>>>>>>>>>>>> that >>>>>>>>>> >>>>>>>>>>>>>>> worries me is that in the beggining of the design >>>> doc, >>>>>> it is >>>>>>>>> said >>>>>>>>>> >>>>>>>>>>>> that >>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>>> matching rows are post filtered while scanning the >>>>>> partition. >>>>>>>>> Can >>>>>>>>>> >>>>>>>>>>> you >>>>>>>>>> >>>>>>>>>>>>>>> confirm or infirm that SAI is efficient with wide >>>>>> partitions >>>>>>>>> and >>>>>>>>>> >>>>>>>>>>>>> provides >>>>>>>>>> >>>>>>>>>>>>>>> the partition offsets to the matching rows ? >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> 2) About space efficiency, one of the biggest >>>> drawback of >>>>>>>> SASI >>>>>>>>>> was >>>>>>>>>> >>>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>> huge >>>>>>>>>> >>>>>>>>>>>>>>> space required for index structure when using >>>> CONTAINS >>>>>> logic >>>>>>>>>> >>>>>>>>>>> because >>>>>>>>>> >>>>>>>>>>>> of >>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>>> decomposition of text columns into n-grams. Will >> SAI >>>>>> suffer >>>>>>>>> from >>>>>>>>>> >>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>> same >>>>>>>>>> >>>>>>>>>>>>>>> issue in future iterations ? I'm anticipating a >> bit >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> 3) If I'm querying using SAI and providing >> complete >>>>>> partition >>>>>>>>>> key, >>>>>>>>>> >>>>>>>>>>>> will >>>>>>>>>> >>>>>>>>>>>>>> it >>>>>>>>>> >>>>>>>>>>>>>>> be more efficient than querying without partition >>>> key. In >>>>>>>> other >>>>>>>>>> >>>>>>>>>>>> words, >>>>>>>>>> >>>>>>>>>>>>>> does >>>>>>>>>> >>>>>>>>>>>>>>> SAI provide any optimisation when partition key is >>>>>> specified >>>>>>>> ? >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> Duy Hai DOAN >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> Le mar. 18 août 2020 à 11:39, Mick Semb Wever < >>>>>>>> m...@apache.org> >>>>>>>>> a >>>>>>>>>> >>>>>>>>>>>>> écrit : >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>> We are looking forward to the community's >>>> feedback >>>>>> and >>>>>>>>>> >>>>>>>>>>>> suggestions. >>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> What comes immediately to mind is testing >>>>>> requirements. It >>>>>>>>> has >>>>>>>>>> >>>>>>>>>>> been >>>>>>>>>> >>>>>>>>>>>>>>>> mentioned already that the project's testability >>>> and QA >>>>>>>>>> >>>>>>>>>>> guidelines >>>>>>>>>> >>>>>>>>>>>>> are >>>>>>>>>> >>>>>>>>>>>>>>>> inadequate to successfully introduce new >> features >>>> and >>>>>>>>>> >>>>>>>>>>> refactorings >>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>>>> codebase. During the 4.0 beta phase this was >>>> intended >>>>>> to be >>>>>>>>>> >>>>>>>>>>>>> addressed, >>>>>>>>>> >>>>>>>>>>>>>>> i.e. >>>>>>>>>> >>>>>>>>>>>>>>>> defining more specific QA guidelines for 4.0-rc. >>>> This >>>>>> would >>>>>>>>> be >>>>>>>>>> an >>>>>>>>>> >>>>>>>>>>>>>>> important >>>>>>>>>> >>>>>>>>>>>>>>>> step towards QA guidelines for all changes and >>>> CEPs >>>>>>>> post-4.0. >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> Questions from me >>>>>>>>>> >>>>>>>>>>>>>>>> - How will this be tested, how will its QA >>>> status and >>>>>>>>>> lifecycle >>>>>>>>>> >>>>>>>>>>> be >>>>>>>>>> >>>>>>>>>>>>>>>> defined? (per above) >>>>>>>>>> >>>>>>>>>>>>>>>> - With existing C* code needing to be changed, >>>> what >>>>>> is the >>>>>>>>>> >>>>>>>>>>>> proposed >>>>>>>>>> >>>>>>>>>>>>>> plan >>>>>>>>>> >>>>>>>>>>>>>>>> for making those changes ensuring maintained QA, >>>> e.g. >>>>>> is >>>>>>>>> there >>>>>>>>>> >>>>>>>>>>>>> separate >>>>>>>>>> >>>>>>>>>>>>>>> QA >>>>>>>>>> >>>>>>>>>>>>>>>> cycles planned for altering the SPI before >> adding >>>> a >>>>>> new SPI >>>>>>>>>> >>>>>>>>>>>>>>> implementation? >>>>>>>>>> >>>>>>>>>>>>>>>> - Despite being out of scope, it would be nice >>>> to have >>>>>>>> some >>>>>>>>>> idea >>>>>>>>>> >>>>>>>>>>>>> from >>>>>>>>>> >>>>>>>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>>>>>>> CEP author of when users might still choose >>>> afresh 2i >>>>>> or >>>>>>>> SASI >>>>>>>>>> >>>>>>>>>>> over >>>>>>>>>> >>>>>>>>>>>>> SAI, >>>>>>>>>> >>>>>>>>>>>>>>>> - Who fills the roles involved? Who are the >>>>>> contributors >>>>>>>> in >>>>>>>>>> this >>>>>>>>>> >>>>>>>>>>>>>>> DataStax >>>>>>>>>> >>>>>>>>>>>>>>>> team? Who is the shepherd? Are there other >>>> stakeholders >>>>>>>>> willing >>>>>>>>>> >>>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>>> be >>>>>>>>>> >>>>>>>>>>>>>>>> involved? >>>>>>>>>> >>>>>>>>>>>>>>>> - Is there a preference to use gdoc instead of >>>> the >>>>>>>> project's >>>>>>>>>> >>>>>>>>>>> wiki, >>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>> >>>>>>>>>>>>>>>> why? (the CEP process suggest a wiki page, and >>>>>> feedback on >>>>>>>>> why >>>>>>>>>> >>>>>>>>>>>>> another >>>>>>>>>> >>>>>>>>>>>>>>>> approach is considered better helps evolve the >> CEP >>>>>> process >>>>>>>>>> >>>>>>>>>>> itself) >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> cheers, >>>>>>>>>> >>>>>>>>>>>>>>>> Mick >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>>> >> > > > -- > alex p --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org