CASSANDRA-14183 review request -> logback upgrade to fix CVE

2018-01-30 Thread Thiago Veronezi
Hi dev team,

Can one of you guys take a look on this jira ticket?
https://issues.apache.org/jira/browse/CASSANDRA-14183

It has an a patch available for a known security issue with one of the
dependencies. It has only with trivial code changes. It should be
straightforward to review it. Any feedback is very welcome.

Thanks,
Thiago


range queries on partition key supported?

2018-01-30 Thread Tyagi, Preetika
Hi All,

I have a quick question on Cassandra's behavior in case of partition keys. I 
know that range queries are allowed in general, however, is it also allowed on 
partition keys as well? The partition key is used as an input to determine a 
node in a cluster, so I'm wondering how one can possibly perform range query on 
that.

Thanks,
Preetika



Re: range queries on partition key supported?

2018-01-30 Thread J. D. Jordan
A range query can be performed on the token of a partition key, not on the 
value.

-Jeremiah

> On Jan 30, 2018, at 12:21 PM, Tyagi, Preetika  
> wrote:
> 
> Hi All,
> 
> I have a quick question on Cassandra's behavior in case of partition keys. I 
> know that range queries are allowed in general, however, is it also allowed 
> on partition keys as well? The partition key is used as an input to determine 
> a node in a cluster, so I'm wondering how one can possibly perform range 
> query on that.
> 
> Thanks,
> Preetika
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



create branch in my github account

2018-01-30 Thread Tyagi, Preetika
Hi all,

I'm working on the JIRA ticket CASSANDRA-13981 and pushed a patch yesterday, 
however, I have been suggested to create a branch in my github account and then 
push all changes into that. The patch is too big hence this seems to be a 
better approach.
I haven't done it before so wanted to ensure I do it correctly without messing 
things up :)


1.  On Cassandra GitHub: https://github.com/apache/cassandra, click on 
"Fork" to create my own copy in my account.

2.  Git clone on the forked branch above

3.  Git checkout 

4.  Apply my patch

5.  Git commit -m ""

6.  Git push origin trunk

Please let me know if you notice any issues. Thanks for your help!

Preetika






Re: create branch in my github account

2018-01-30 Thread Michael Shuler
On 01/30/2018 03:47 PM, Tyagi, Preetika wrote:
> Hi all,
> 
> I'm working on the JIRA ticket CASSANDRA-13981 and pushed a patch
> yesterday, however, I have been suggested to create a branch in my
> github account and then push all changes into that. The patch is too
> big hence this seems to be a better approach. I haven't done it
> before so wanted to ensure I do it correctly without messing things
> up :)
> 
> 
> 1.  On Cassandra GitHub: https://github.com/apache/cassandra,
> click on "Fork" to create my own copy in my account.
> 
> 2.  Git clone on the forked branch above

s/branch/repository/ - this is a new forked repo, not a branch

> 3.  Git checkout 

git checkout trunk
  # since 13981 appears to for 4.0 (trunk)
  # if you worked off some random sha, you may need to rebase on
  # trunk HEAD, otherwise it may not cleanly merge and that will be
  # the first patch review request.

git checkout -b CASSANDRA-13981
  # create a new branch

> 4.  Apply my patch
> 
> 5.  Git commit -m ""
> 
> 6.  Git push origin trunk

git push origin CASSANDRA-13981  # push a new branch to your fork

> Please let me know if you notice any issues. Thanks for your help!

You could do this in your fork on the trunk repository, but it's
probably better to create a new branch, so you can fetch changes from
the upstream trunk branch and rebase your branch, if that is needed. It
is very common to have a number of remotes configured in your local
repository: one for your fork, one for the apache upstream, ones for
other user's forks, etc. If you do your work directly in your trunk
branch, you'll have conflicts when pulling in new commits from
apache/cassandra trunk, for example.

-- 
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: create branch in my github account

2018-01-30 Thread Michael Shuler
On 01/30/2018 04:03 PM, Michael Shuler wrote:
> 
> You could do this in your fork on the trunk repository, but it's

You could do this in your fork on the trunk *branch*, but...

- apologies for mixing terms, while trying to be clear and helpful about
terminology :)

-- 
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



CDC usability and future development

2018-01-30 Thread Andrew Prudhomme
Hi all,

We are currently designing a system that allows our Cassandra clusters to
produce a stream of data updates. Naturally, we have been evaluating if CDC
can aid in this endeavor. We have found several challenges in using CDC for
this purpose.

CDC provides only the mutation as opposed to the full column value, which
tends to be of limited use for us. Applications might want to know the full
column value, without having to issue a read back. We also see value in
being able to publish the full column value both before and after the
update. This is especially true when deleting a column since this stream
may be joined with others, or consumers may require other fields to
properly process the delete.

Additionally, there is some difficulty with processing CDC itself such as:
- Updates not being immediately available (addressed by CASSANDRA-12148)
- Each node providing an independent streams of updates that must be
unified and deduplicated

Our question is, what is the vision for CDC development? The current
implementation could work for some use cases, but is a ways from a general
streaming solution. I understand that the nature of Cassandra makes this
quite complicated, but are there any thoughts or desires on the future
direction of CDC?

Thanks


CDC usability and future development

2018-01-30 Thread Andrew Prudhomme
Hi all,

We are currently designing a system that allows our Cassandra clusters to
produce a stream of data updates. Naturally, we have been evaluating if CDC
can aid in this endeavor. We have found several challenges in using CDC for
this purpose.

CDC provides only the mutation as opposed to the full column value, which
tends to be of limited use for us. Applications might want to know the full
column value, without having to issue a read back. We also see value in
being able to publish the full column value both before and after the
update. This is especially true when deleting a column since this stream
may be joined with others, or consumers may require other fields to
properly process the delete.

Additionally, there is some difficulty with processing CDC itself such as:
- Updates not being immediately available (addressed by CASSANDRA-12148)
- Each node providing an independent streams of updates that must be
unified and deduplicated

Our question is, what is the vision for CDC development? The current
implementation could work for some use cases, but is a ways from a general
streaming solution. I understand that the nature of Cassandra makes this
quite complicated, but are there any thoughts or desires on the future
direction of CDC?

Thanks


Re: CDC usability and future development

2018-01-30 Thread Jeff Jirsa
Here's a deck of some proposed additions, discussed at one of the NGCC
sessions last fall:

https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf



On Tue, Jan 30, 2018 at 5:10 PM, Andrew Prudhomme  wrote:

> Hi all,
>
> We are currently designing a system that allows our Cassandra clusters to
> produce a stream of data updates. Naturally, we have been evaluating if CDC
> can aid in this endeavor. We have found several challenges in using CDC for
> this purpose.
>
> CDC provides only the mutation as opposed to the full column value, which
> tends to be of limited use for us. Applications might want to know the full
> column value, without having to issue a read back. We also see value in
> being able to publish the full column value both before and after the
> update. This is especially true when deleting a column since this stream
> may be joined with others, or consumers may require other fields to
> properly process the delete.
>
> Additionally, there is some difficulty with processing CDC itself such as:
> - Updates not being immediately available (addressed by CASSANDRA-12148)
> - Each node providing an independent streams of updates that must be
> unified and deduplicated
>
> Our question is, what is the vision for CDC development? The current
> implementation could work for some use cases, but is a ways from a general
> streaming solution. I understand that the nature of Cassandra makes this
> quite complicated, but are there any thoughts or desires on the future
> direction of CDC?
>
> Thanks
>
>


RE: range queries on partition key supported?

2018-01-30 Thread Tyagi, Preetika
So that means more than one nodes can be selected to fulfill a range query 
based on the token, correct?

I was looking at this link: 
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

In the example query,
SELECT * FROM numberOfRequests
WHERE token(cluster, date) > token('cluster1', '2015-06-03')
AND token(cluster, date) <= token('cluster1', '2015-06-05')
AND time = '12:00'

More than one nodes might get picked for this token based range query. And, 
then entire partition on each node will be searched based on the clustering key 
(i.e. "time" in this case).
Is my understanding correct?

Thanks,
Preetika

-Original Message-
From: J. D. Jordan [mailto:jeremiah.jor...@gmail.com] 
Sent: Tuesday, January 30, 2018 10:13 AM
To: dev@cassandra.apache.org
Subject: Re: range queries on partition key supported?

A range query can be performed on the token of a partition key, not on the 
value.

-Jeremiah

> On Jan 30, 2018, at 12:21 PM, Tyagi, Preetika  
> wrote:
> 
> Hi All,
> 
> I have a quick question on Cassandra's behavior in case of partition keys. I 
> know that range queries are allowed in general, however, is it also allowed 
> on partition keys as well? The partition key is used as an input to determine 
> a node in a cluster, so I'm wondering how one can possibly perform range 
> query on that.
> 
> Thanks,
> Preetika
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org