Re: [GSOC] Call for Mentors

2022-02-14 Thread Benjamin Lerer
Hi Paulo,

I would like to propose CASSANDRA-17380 (Adds support for EXPLAIN
statements) as a project for this year's GSOC.

Le ven. 11 févr. 2022 à 19:54, Paulo Motta  a
écrit :

> Unfortunately we didn't, so far.
>
> Em sex., 11 de fev. de 2022 às 15:32, Henrik Ingo <
> henrik.i...@datastax.com> escreveu:
>
>> Hi Paulo
>>
>> Just checking, am I using Jira right:
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20labels%20%3D%20gsoc%20and%20statusCategory%20!%3D%20Done%20
>>
>> It looks like we ended up with no gsoc projects submitted? Or am I
>> querying wrong?
>>
>> henrik
>>
>> On Thu, Feb 3, 2022 at 12:26 AM Paulo Motta 
>> wrote:
>>
>>> Hi Henrik,
>>>
>>> I am happy to give feedback to project ideas - but they ultimately need
>>> to be registered by prospective mentors on JIRA with the "gsoc" tag to be
>>> considered a "subscribed idea".
>>>
>>> The project idea JIRA should have a "high level" overview of what the
>>> project is:
>>> - What is the problem statement?
>>> - Rough plan on how to approach the problem.
>>> - What are the main milestones/deliverables? (ie.
>>> code/benchmark/framework/blog post etc)
>>> - What prior knowledge is required to complete the task?
>>> - What warm-up tasks can the candidate do to ramp up for the project?
>>>
>>> The mentor will work with potential participants to refine the high
>>> level description into smaller subtasks at a later stage (during candidate
>>> application period).
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>> Em qua., 2 de fev. de 2022 às 19:02, Henrik Ingo <
>>> henrik.i...@datastax.com> escreveu:
>>>
 Hi Paulo

 I think Shaunak and Aleks V already pinged you on Slack about their
 ideas. When you say we don't have any subscribed ideas, what is missing?

 henrik

 On Wed, Feb 2, 2022 at 4:03 PM Paulo Motta 
 wrote:

> Hi everyone,
>
> We need to tell ASF how many slots we will need for GSoC (if any) by
> February 20. So far we don't have any subscribed project ideas.
>
> If you are interested in being a GSoC mentor, just ping me on slack
> and I will be happy to give you feedback on the project idea proposal.
> Please do so by no later than February 10 to allow sufficient time for
> follow-ups.
>
> Cheers,
>
> Paulo
>
> Em qua., 19 de jan. de 2022 às 10:54, Paulo Motta 
> escreveu:
>
>> Hi everyone,
>>
>> Following up from the initial GSoC Kick-off thread [1] I would like
>> to invite contributors to submit GSoC project ideas. In order to submit a
>> project idea, just tag a JIRA ticket with the "gsoc" label and add 
>> yourself
>> to the "Mentor" field to indicate you're willing to mentor this project.
>>
>> Existing JIRA tickets can be repurposed as GSoC projects or new
>> tickets can be created with new features or improvements specifically for
>> GSoC. The best GSoC project ideas are those which are self-contained: 
>> have
>> a well defined scope, discrete milestones and definition of done. 
>> Generally
>> the areas which are easier for GSoC contributors to get started are:
>> - UX improvements
>> - Tools
>> - Benchmarking
>> - Refactoring and Modularization
>>
>> Non-committers are more than welcome to submit project ideas and
>> mentor projects, as long as a committer is willing to co-mentor the
>> project. As a matter of fact I was a GSoC mentor before becoming a
>> committer, so I can say this is a great way to pave your way to
>> committership. ;)
>>
>> Mentor tasks involve having 1 or 2 weekly meetings with the GSoC
>> participant to track the project status and give guidance to the
>> participant towards the completion of the project, as well as reviewing
>> code submissions.
>>
>> This year, GSoC is open to any participant over 18 years of age, no
>> longer focusing solely on university students. GSoC projects can be of 
>> ~175
>> hour (medium) and 350 hour (large), and can range from 12 to 22 weeks
>> starting in July.
>>
>> We have little less than 2 months until the start of the GSoC
>> application period on March 7, but ideally we want to have an "Ideas 
>> List"
>> ready before that so prospective participants can start engaging with the
>> project and working with mentors to refine the project before submitting 
>> an
>> application.
>>
>> This year I will not be able to participate as a primary mentor but I
>> would be happy to co-mentor other projects as well as help with questions
>> and guidance.
>>
>> Kind regards,
>>
>> Paulo
>>
>> [1] https://lists.apache.org/thread/58v2bvfzwtfgqdx90qmm4tmyoqzsgtn4
>>
>

 --

 Henrik Ingo

 +358 40 569 7354 <358405697354>

 [image: Visit us online.]   [image: Visit
 us on Twit

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-14 Thread Mike Adamson
> We don't need a whole "codec framework" for V1, but we're still embedding 
> some versioning information in the column index on-disk structures, right?

I’m not sure why we would want to pull the versioning code only to have to put 
it back in as soon as we need to change the on-disk format. We also need to 
consider whether the legacy format used by DSE is supported in OSS. I’m not 
sure of the policy on this although I strongly suspect that the answer is that 
it won’t be supported. Either way, it would seem to be a lot of work to pull 
the versioning code out at this point since it formed part of a major refactor 
of the SAI framework and plumbing.

MikeA

> On 11 Feb 2022, at 18:47, Caleb Rackliffe  wrote:
> 
> Just finished reading the latest version of the CEP. Here are my thoughts:
> 
> - We've already talked about OR queries, so I won't rehash that, but 
> tokenization support seems like it might be another one of those places where 
> we can cut scope if we want to get V1 out the door. It shouldn't be that hard 
> to detangle from the rest of the code.
> - We mention the JMX metric ecosystem in the CEP, but not the related virtual 
> tables. This isn't a big issue, and doesn't mean we need to change the CEP, 
> but it might be helpful for those not familiar with the existing prototype to 
> know they exist :)
> - It's probably below the line for CEP discussion, but the text and numeric 
> index formats will probably change over time. We don't need a whole "codec 
> framework" for V1, but we're still embedding some versioning information in 
> the column index on-disk structures, right?
> 
> To offset my obvious partiality around this CEP, I've already made an effort 
> to raise some of the issues that may come up to challenge us from a macro 
> perspective. It seems like the prevailing opinion here is that they are 
> either surmountable or simply basic conceptual difficulties w/ distributed 
> secondary indexing.
> 
> tl;dr I'm +1 on bringing this to a vote and starting to put together all the 
> pieces for CASSANDRA-16052 
>  :)
> 
> On Thu, Feb 10, 2022 at 11:26 AM Mike Adamson  > wrote:
> > I'd be interested to hear from Mike/Jason on the OR support topic, of 
> > course.
> 
> The support for OR within SAI is fairly minimal and will not work without the 
> non-SAI changes needed. Since the non-SAI OR changes are extensive it would 
> be better to bring those in under their own CEP. 
> 
> I’d leave the decision of whether to put the rest of SAI behind an 
> experimental flag to others. My preference would be to not do so because the 
> non-OR implementation has been tested and used on production for over a year 
> now.
> 
> MikeA
> 
>> On 9 Feb 2022, at 13:06, bened...@apache.org  
>> wrote:
>> 
>> > Is there some mechanism such as experimental flags, which would allow the 
>> > SAI-only OR support to be merged into trunk
>>  
>> FWIW, I’m OK with this merging to trunk, either hidden behind a CI-only flag 
>> or exposed to the user via some experimental flag (and a suitable NEWS.txt). 
>> We’ve discussed the need to periodically merge feature branches with trunk 
>> before they are complete. If the work is logically complete for SAI, and 
>> we’re only pending work to make OR consistent between SAI and non-SAI 
>> queries, I think that more than meets this criterion.
>>  
>>  
>> From: Henrik Ingo > >
>> Date: Monday, 7 February 2022 at 12:03
>> To: dev@cassandra.apache.org  
>> mailto:dev@cassandra.apache.org>>
>> Subject: Re: [DISCUSS] CEP-7 Storage Attached Index
>> 
>> Thanks Benjamin for reviewing and raising this.
>>  
>> While I don't speak for the CEP authors, just some thoughts from me:
>>  
>> On Mon, Feb 7, 2022 at 11:18 AM Benjamin Lerer > > wrote:
>> I would like to raise 2 points regarding the current CEP proposal:
>>  
>> 1. There are mention of some target versions and of the removal of SASI 
>>  
>> At this point, we have not agreed on any version numbers and I do not feel 
>> that removing SASI should be part of the proposal for now.
>> It seems to me that we should see first the adoption surrounding SAI before 
>> talking about deprecating other solutions.
>>  
>>  
>> This seems rather uncontroversial. I think the CEP template and previous 
>> CEPs invite  the discussion on whether the new feature will or may replace 
>> an existing feature. But at the same time that's of course out of scope for 
>> the work at hand. I have no opinion one way or the other myself.
>>  
>>  
>> 2. OR queries
>>  
>> It is unclear to me if the proposal is about adding OR support only for SAI 
>> index or for other types of queries too.
>> In the past, we had the nasty habit for CQL to provide only partialially 
>> implemented features which resulted in a bad user experience.
>> Some examples

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-14 Thread Caleb Rackliffe
Agreed there’s no reason to pull it out. I was just wondering what state it was 
in, given I didn’t see it mentioned in the CEP.

> On Feb 14, 2022, at 8:12 AM, Mike Adamson  wrote:
> 
> > We don't need a whole "codec framework" for V1, but we're still embedding 
> some versioning information in the column index on-disk structures, right?
> 
> I’m not sure why we would want to pull the versioning code only to have to 
> put it back in as soon as we need to change the on-disk format. We also need 
> to consider whether the legacy format used by DSE is supported in OSS. I’m 
> not sure of the policy on this although I strongly suspect that the answer is 
> that it won’t be supported. Either way, it would seem to be a lot of work to 
> pull the versioning code out at this point since it formed part of a major 
> refactor of the SAI framework and plumbing.
> 
> MikeA
> 
>> On 11 Feb 2022, at 18:47, Caleb Rackliffe  wrote:
>> 
>> Just finished reading the latest version of the CEP. Here are my thoughts:
>> 
>> - We've already talked about OR queries, so I won't rehash that, but 
>> tokenization support seems like it might be another one of those places 
>> where we can cut scope if we want to get V1 out the door. It shouldn't be 
>> that hard to detangle from the rest of the code.
>> - We mention the JMX metric ecosystem in the CEP, but not the related 
>> virtual tables. This isn't a big issue, and doesn't mean we need to change 
>> the CEP, but it might be helpful for those not familiar with the existing 
>> prototype to know they exist :)
>> - It's probably below the line for CEP discussion, but the text and numeric 
>> index formats will probably change over time. We don't need a whole "codec 
>> framework" for V1, but we're still embedding some versioning information in 
>> the column index on-disk structures, right?
>> 
>> To offset my obvious partiality around this CEP, I've already made an effort 
>> to raise some of the issues that may come up to challenge us from a macro 
>> perspective. It seems like the prevailing opinion here is that they are 
>> either surmountable or simply basic conceptual difficulties w/ distributed 
>> secondary indexing.
>> 
>> tl;dr I'm +1 on bringing this to a vote and starting to put together all the 
>> pieces for CASSANDRA-16052 :)
>> 
>>> On Thu, Feb 10, 2022 at 11:26 AM Mike Adamson  wrote:
>>> > I'd be interested to hear from Mike/Jason on the OR support topic, of 
>>> > course.
>>> 
>>> The support for OR within SAI is fairly minimal and will not work without 
>>> the non-SAI changes needed. Since the non-SAI OR changes are extensive it 
>>> would be better to bring those in under their own CEP. 
>>> 
>>> I’d leave the decision of whether to put the rest of SAI behind an 
>>> experimental flag to others. My preference would be to not do so because 
>>> the non-OR implementation has been tested and used on production for over a 
>>> year now.
>>> 
>>> MikeA
>>> 
 On 9 Feb 2022, at 13:06, bened...@apache.org wrote:
 
 > Is there some mechanism such as experimental flags, which would allow 
 > the SAI-only OR support to be merged into trunk
  
 FWIW, I’m OK with this merging to trunk, either hidden behind a CI-only 
 flag or exposed to the user via some experimental flag (and a suitable 
 NEWS.txt). We’ve discussed the need to periodically merge feature branches 
 with trunk before they are complete. If the work is logically complete for 
 SAI, and we’re only pending work to make OR consistent between SAI and 
 non-SAI queries, I think that more than meets this criterion.
  
  
 From: Henrik Ingo 
 Date: Monday, 7 February 2022 at 12:03
 To: dev@cassandra.apache.org 
 Subject: Re: [DISCUSS] CEP-7 Storage Attached Index
 
 Thanks Benjamin for reviewing and raising this.
  
 While I don't speak for the CEP authors, just some thoughts from me:
  
 On Mon, Feb 7, 2022 at 11:18 AM Benjamin Lerer  wrote:
 I would like to raise 2 points regarding the current CEP proposal:
  
 1. There are mention of some target versions and of the removal of SASI 
  
 At this point, we have not agreed on any version numbers and I do not feel 
 that removing SASI should be part of the proposal for now.
 It seems to me that we should see first the adoption surrounding SAI 
 before talking about deprecating other solutions.
  
  
 This seems rather uncontroversial. I think the CEP template and previous 
 CEPs invite  the discussion on whether the new feature will or may replace 
 an existing feature. But at the same time that's of course out of scope 
 for the work at hand. I have no opinion one way or the other myself.
  
  
 2. OR queries
  
 It is unclear to me if the proposal is about adding OR support only for 
 SAI index or for other types of queries too.
 In the past, we had the nasty habit for CQL to provide only p

Re: [GSOC] Call for Mentors

2022-02-14 Thread Joseph Lynch
Hi Paulo!

Thanks for organizing this. I would like to propose CASSANDRA-17381
[1] which will implement/verify BoundedReadCompactionStrategy for this
year's GSOC and I can mentor (although I think we may need a
co-mentor?). Please let me know if there is any further context I need
to provide or jira tagging I need to do (I labeled it gsoc and
gsoc2022).

[1] https://issues.apache.org/jira/browse/CASSANDRA-17381

-Joey


On Fri, Feb 11, 2022 at 1:54 PM Paulo Motta  wrote:
>
> Unfortunately we didn't, so far.
>
> Em sex., 11 de fev. de 2022 às 15:32, Henrik Ingo  
> escreveu:
>>
>> Hi Paulo
>>
>> Just checking, am I using Jira right: 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20labels%20%3D%20gsoc%20and%20statusCategory%20!%3D%20Done%20
>>
>> It looks like we ended up with no gsoc projects submitted? Or am I querying 
>> wrong?
>>
>> henrik
>>
>> On Thu, Feb 3, 2022 at 12:26 AM Paulo Motta  wrote:
>>>
>>> Hi Henrik,
>>>
>>> I am happy to give feedback to project ideas - but they ultimately need to 
>>> be registered by prospective mentors on JIRA with the "gsoc" tag to be 
>>> considered a "subscribed idea".
>>>
>>> The project idea JIRA should have a "high level" overview of what the 
>>> project is:
>>> - What is the problem statement?
>>> - Rough plan on how to approach the problem.
>>> - What are the main milestones/deliverables? (ie. 
>>> code/benchmark/framework/blog post etc)
>>> - What prior knowledge is required to complete the task?
>>> - What warm-up tasks can the candidate do to ramp up for the project?
>>>
>>> The mentor will work with potential participants to refine the high level 
>>> description into smaller subtasks at a later stage (during candidate 
>>> application period).
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>> Em qua., 2 de fev. de 2022 às 19:02, Henrik Ingo  
>>> escreveu:

 Hi Paulo

 I think Shaunak and Aleks V already pinged you on Slack about their ideas. 
 When you say we don't have any subscribed ideas, what is missing?

 henrik

 On Wed, Feb 2, 2022 at 4:03 PM Paulo Motta  
 wrote:
>
> Hi everyone,
>
> We need to tell ASF how many slots we will need for GSoC (if any) by 
> February 20. So far we don't have any subscribed project ideas.
>
> If you are interested in being a GSoC mentor, just ping me on slack and I 
> will be happy to give you feedback on the project idea proposal. Please 
> do so by no later than February 10 to allow sufficient time for 
> follow-ups.
>
> Cheers,
>
> Paulo
>
> Em qua., 19 de jan. de 2022 às 10:54, Paulo Motta  
> escreveu:
>>
>> Hi everyone,
>>
>> Following up from the initial GSoC Kick-off thread [1] I would like to 
>> invite contributors to submit GSoC project ideas. In order to submit a 
>> project idea, just tag a JIRA ticket with the "gsoc" label and add 
>> yourself to the "Mentor" field to indicate you're willing to mentor this 
>> project.
>>
>> Existing JIRA tickets can be repurposed as GSoC projects or new tickets 
>> can be created with new features or improvements specifically for GSoC. 
>> The best GSoC project ideas are those which are self-contained: have a 
>> well defined scope, discrete milestones and definition of done. 
>> Generally the areas which are easier for GSoC contributors to get 
>> started are:
>> - UX improvements
>> - Tools
>> - Benchmarking
>> - Refactoring and Modularization
>>
>> Non-committers are more than welcome to submit project ideas and mentor 
>> projects, as long as a committer is willing to co-mentor the project. As 
>> a matter of fact I was a GSoC mentor before becoming a committer, so I 
>> can say this is a great way to pave your way to committership. ;)
>>
>> Mentor tasks involve having 1 or 2 weekly meetings with the GSoC 
>> participant to track the project status and give guidance to the 
>> participant towards the completion of the project, as well as reviewing 
>> code submissions.
>>
>> This year, GSoC is open to any participant over 18 years of age, no 
>> longer focusing solely on university students. GSoC projects can be of 
>> ~175 hour (medium) and 350 hour (large), and can range from 12 to 22 
>> weeks starting in July.
>>
>> We have little less than 2 months until the start of the GSoC 
>> application period on March 7, but ideally we want to have an "Ideas 
>> List" ready before that so prospective participants can start engaging 
>> with the project and working with mentors to refine the project before 
>> submitting an application.
>>
>> This year I will not be able to participate as a primary mentor but I 
>> would be happy to co-mentor other projects as well as help with 
>> questions and guidance.
>>
>> Kind regards,
>>
>> Paulo
>>
>> [1] htt

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-14 Thread Henrik Ingo
On Fri, Feb 11, 2022 at 8:47 PM Caleb Rackliffe 
wrote:

> Just finished reading the latest version of the CEP. Here are my thoughts:
>
> - We've already talked about OR queries, so I won't rehash that, but
> tokenization support seems like it might be another one of those places
> where we can cut scope if we want to get V1 out the door. It shouldn't be
> that hard to detangle from the rest of the code.
>

The tokenization support is already implemented. It's available in our
public fork but at least last time I was involved, there's not really any
public documentation. Lucene comes with dozens of tokenizers so the
documentation effort will be significant.

So the situation is similar to OR: The community may want to break out a
separate CEP to debate the user facing syntax. Alternatively, this can
simply happen as part of the PR that could be submitted as soon as CEP-7 is
approved.



> - We mention the JMX metric ecosystem in the CEP, but not the related
> virtual tables. This isn't a big issue, and doesn't mean we need to change
> the CEP, but it might be helpful for those not familiar with the existing
> prototype to know they exist :)
>

Thanks for the callout. Maybe they should indeed be mentioned together.


> - It's probably below the line for CEP discussion, but the text and
> numeric index formats will probably change over time. We don't need a whole
> "codec framework" for V1, but we're still embedding some versioning
> information in the column index on-disk structures, right?
>
>
On the contrary, this is a very valid question. As you know SAI has been GA
for over a year in both our DSE and Astra products, and what is described
in CEP-7 to be included in Cassandra is for the SAI team known as V2. (But
to be clear, it's named V1 in the CEP and in the context of Cassandra!) So
the code does contain facilities to support multiple generations of index
formats. If encountering an sstable of the older version, then the relevant
code would be used to read the index files. Upon compaction the newer
version is written. And there needs to be some kind of global check to know
that new features are only available once all sstables cluster wide are of
the required version.


> To offset my obvious partiality around this CEP, I've already made an
> effort to raise some of the issues that may come up to challenge us from a
> macro perspective. It seems like the prevailing opinion here is that they
> are either surmountable or simply basic conceptual difficulties w/
> distributed secondary indexing.
>
>
This might be a good moment to say that we really appreciate your
investment and support in this CEP!

henrik


Re: [VOTE] Release Apache Cassandra 4.0.3

2022-02-14 Thread Mick Semb Wever
>
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
>

+1

Checked
- signing correct
- checksums are correct
- source artefact builds
- binary artefact runs
- debian package runs
- redhat package runs


Re: [VOTE] Release Apache Cassandra 4.0.3

2022-02-14 Thread Brandon Williams
+1

On Sun, Feb 13, 2022 at 4:03 PM Mick Semb Wever  wrote:
>
> Proposing the test build of Cassandra 4.0.3 for release.
>
>
> sha1: a87055d56a33a9b17606f14535f48eb461965b82
>
> Git: 
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.3-tentative
>
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1259/org/apache/cassandra/cassandra-all/4.0.3/
>
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/4.0.3/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt: 
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.3-tentative
> [2]: NEWS.txt: 
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.3-tentative


Re: [VOTE] Release Apache Cassandra 4.0.3

2022-02-14 Thread Marcus Eriksson
+1

On Sun, Feb 13, 2022 at 11:03:01PM +0100, Mick Semb Wever wrote:
> Proposing the test build of Cassandra 4.0.3 for release.
> 
> 
> sha1: a87055d56a33a9b17606f14535f48eb461965b82
> 
> Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.3-tentative
> 
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1259/org/apache/cassandra/cassandra-all/4.0.3/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/4.0.3/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.3-tentative
> [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.3-tentative