Re: CASSANDRA-8527

2018-01-05 Thread Alexander Dejanovski
Hi Aleksey,

ok we'll split the work and only deal with row level tombstones in
CASSANDRA-8527  and
create a follow up ticket to work on the separate counts you're suggesting.
My understanding of what you say is that you would not include range
tombstones in the warn/fail threshold, but row level tombstones have
somewhat an impact that is similar to cell tombstones. They will be
retained in memory and will be sent to replicas.
If we don't count them in the thresholds (at least for warnings), people
will miss the fact that they may be reading a lot of tombstones.
Are you ok with including those row tombstones as part of the thresholds ?
This was the original intent for creating this ticket, which was a
follow-up to CASSANDRA-8477
.

For the follow up ticket, we may want to move the discussion in JIRA once
we've create the ticket, but a merge listener seems like the right way to
detect rows shadowed by range tombstones. That would force to change the
UnfilteredRowIterator interface to include the tombstones/rows/cells stats
as this is what is returned from the lower levels of the read path.
Is there any easier way to achieve this that you can think of, as that
interface is used in many parts of the code ?

On Wed, Dec 27, 2017 at 1:35 PM Aleksey Yeshchenko 
wrote:

> As Jeff says, the number of actual tombstones is no less relevant than the
> number of
> cells and rows shadowed by them. And the way row and cell tombstones
> affect a read
> query can be very different from the way a big range tombstone might: you
> potentially
> have to retain every (regular) tombstone in memory and have replicas ship
> them to
> a coordinator, while you can discard everything shadowed by a big RT and
> only serialize
> a tiny bit of the data between the replicas and the coordinator.
>
> So a mixed metric that just mixes up rows and cells shadowed by all three
> kinds of tombstones
> without any differentiation, while better than not having that visibility
> at all, is worse than having
> a detailed rundown. If possible, I’d love to see proper separate counts
> tracked: range, row, and single-cell tombstones encountered, and # of rows
> or cells obsoleted by each type.
>
> I know that this is a non-trivial change, however, but hey, it doesn’t
> have to be a trivial patch if it’s going into 4.0.
>
> In the meantime I think it’d be helpful to report that single count. But I
> don’t like the idea of redefining what
> tombstone_warn_threshold and tombstone_failure_threshold mean, even in a
> major release, as RTs are
> qualitatively different from other tombstones, and have a much lower
> impact per dead row.
>
> —
> AY
>
> On 22 December 2017 at 03:53:47, kurt greaves (k...@instaclustr.com)
> wrote:
>
> I think that's a good idea for 4.x, but not so for current branches. I
> think as long as we document it well for 4.0 upgrades it's not so much of a
> problem. Obviously there will be cases of queries failing that were
> previously succeeding but we can already use
> tombstone_failure|warn_threshold to tune around that already. I don't think
> we need another yaml setting to enable/disable counting deleted rows for
> these thresholds, especially because it's going into 4.0. It *might* be a
> good idea to bump the tombstone failure threshold default as a safety net
> though (as well as put something in the NEWS.txt).
>
> On 21 December 2017 at 20:11, Jon Haddad  wrote:
>
> > I had suggested to Alex we kick this discussion over to the ML because
> the
> > change will have a significant impact on the behavior of Cassandra when
> > doing reads with range tombstones that cover a lot of rows. The behavior
> > now is a little weird, a single tombstone could shadow hundreds of
> > thousands or even millions of rows, and the query would probably just
> time
> > out. Personally, I’m in favor of the change in behavior of this patch but
> > I wanted to get some more feedback before committing to it. Are there any
> > objections to what Alex described?
> >
> > Regarding Mockito, I’ve been meaning to bring this up for a while, and
> I’m
> > a solid +1 on introducing it to help with testing. In an ideal world we’d
> > have no singletons and could test everything in isolation, but
> > realistically that’s a multi year process and we just aren’t there.
> >
> >
> > > On Dec 19, 2017, at 11:07 PM, Alexander Dejanovski <
> > a...@thelastpickle.com> wrote:
> > >
> > > Hi folks,
> > >
> > > I'm working on CASSANDRA-8527
> > >  and would need
> to
> > > discuss a few things.
> > >
> > > The ticket makes it visible in tracing and metrics that rows shadowed
> by
> > > range tombstones were scanned during reads.
> > > Currently, scanned range tombstones aren't reported anywhere which
> hides
> > > the cause of performance issues during reads when the users perform
> > primary
> > > key deletes.
>

Re: CASSANDRA-8527

2018-01-05 Thread Jon Haddad
I think it’s reasonable to count the number of range tombstones towards the 
total tombstone count / threshold.  

I agree the number of rows shadowed by the tombstones should be tracked 
separately, and we should probably think a bit more about how to add 
configuration / limits around this without burdening the user with a bunch of 
new flags they have to think about.  I would prefer to avoid any more 
configuration settings as complex as back_pressure_strategy going forward.

> On Jan 5, 2018, at 9:36 AM, Alexander Dejanovski  
> wrote:
> 
> Hi Aleksey,
> 
> ok we'll split the work and only deal with row level tombstones in
> CASSANDRA-8527  and
> create a follow up ticket to work on the separate counts you're suggesting.
> My understanding of what you say is that you would not include range
> tombstones in the warn/fail threshold, but row level tombstones have
> somewhat an impact that is similar to cell tombstones. They will be
> retained in memory and will be sent to replicas.
> If we don't count them in the thresholds (at least for warnings), people
> will miss the fact that they may be reading a lot of tombstones.
> Are you ok with including those row tombstones as part of the thresholds ?
> This was the original intent for creating this ticket, which was a
> follow-up to CASSANDRA-8477
> .
> 
> For the follow up ticket, we may want to move the discussion in JIRA once
> we've create the ticket, but a merge listener seems like the right way to
> detect rows shadowed by range tombstones. That would force to change the
> UnfilteredRowIterator interface to include the tombstones/rows/cells stats
> as this is what is returned from the lower levels of the read path.
> Is there any easier way to achieve this that you can think of, as that
> interface is used in many parts of the code ?
> 
> On Wed, Dec 27, 2017 at 1:35 PM Aleksey Yeshchenko 
> wrote:
> 
>> As Jeff says, the number of actual tombstones is no less relevant than the
>> number of
>> cells and rows shadowed by them. And the way row and cell tombstones
>> affect a read
>> query can be very different from the way a big range tombstone might: you
>> potentially
>> have to retain every (regular) tombstone in memory and have replicas ship
>> them to
>> a coordinator, while you can discard everything shadowed by a big RT and
>> only serialize
>> a tiny bit of the data between the replicas and the coordinator.
>> 
>> So a mixed metric that just mixes up rows and cells shadowed by all three
>> kinds of tombstones
>> without any differentiation, while better than not having that visibility
>> at all, is worse than having
>> a detailed rundown. If possible, I’d love to see proper separate counts
>> tracked: range, row, and single-cell tombstones encountered, and # of rows
>> or cells obsoleted by each type.
>> 
>> I know that this is a non-trivial change, however, but hey, it doesn’t
>> have to be a trivial patch if it’s going into 4.0.
>> 
>> In the meantime I think it’d be helpful to report that single count. But I
>> don’t like the idea of redefining what
>> tombstone_warn_threshold and tombstone_failure_threshold mean, even in a
>> major release, as RTs are
>> qualitatively different from other tombstones, and have a much lower
>> impact per dead row.
>> 
>> —
>> AY
>> 
>> On 22 December 2017 at 03:53:47, kurt greaves (k...@instaclustr.com)
>> wrote:
>> 
>> I think that's a good idea for 4.x, but not so for current branches. I
>> think as long as we document it well for 4.0 upgrades it's not so much of a
>> problem. Obviously there will be cases of queries failing that were
>> previously succeeding but we can already use
>> tombstone_failure|warn_threshold to tune around that already. I don't think
>> we need another yaml setting to enable/disable counting deleted rows for
>> these thresholds, especially because it's going into 4.0. It *might* be a
>> good idea to bump the tombstone failure threshold default as a safety net
>> though (as well as put something in the NEWS.txt).
>> 
>> On 21 December 2017 at 20:11, Jon Haddad  wrote:
>> 
>>> I had suggested to Alex we kick this discussion over to the ML because
>> the
>>> change will have a significant impact on the behavior of Cassandra when
>>> doing reads with range tombstones that cover a lot of rows. The behavior
>>> now is a little weird, a single tombstone could shadow hundreds of
>>> thousands or even millions of rows, and the query would probably just
>> time
>>> out. Personally, I’m in favor of the change in behavior of this patch but
>>> I wanted to get some more feedback before committing to it. Are there any
>>> objections to what Alex described?
>>> 
>>> Regarding Mockito, I’ve been meaning to bring this up for a while, and
>> I’m
>>> a solid +1 on introducing it to help with testing. In an ideal world we’d
>>> have no singletons and could test everything in isolation, but
>>> real

Re: CASSANDRA-8527

2018-01-05 Thread Aleksey Yeshchenko
Counting the number of range tombstones - sure, that is reasonable.

Counting cells/rows shadowed by range tombstones toward the limit, as if they 
were regular tombstones - arguably not so much.

—
AY

On 5 January 2018 at 18:53:45, Jon Haddad (j...@jonhaddad.com) wrote:

I think it’s reasonable to count the number of range tombstones towards the 
total tombstone count / threshold.  

I agree the number of rows shadowed by the tombstones should be tracked 
separately, and we should probably think a bit more about how to add 
configuration / limits around this without burdening the user with a bunch of 
new flags they have to think about. I would prefer to avoid any more 
configuration settings as complex as back_pressure_strategy going forward.  

> On Jan 5, 2018, at 9:36 AM, Alexander Dejanovski  
> wrote:  
>  
> Hi Aleksey,  
>  
> ok we'll split the work and only deal with row level tombstones in  
> CASSANDRA-8527  and  
> create a follow up ticket to work on the separate counts you're suggesting.  
> My understanding of what you say is that you would not include range  
> tombstones in the warn/fail threshold, but row level tombstones have  
> somewhat an impact that is similar to cell tombstones. They will be  
> retained in memory and will be sent to replicas.  
> If we don't count them in the thresholds (at least for warnings), people  
> will miss the fact that they may be reading a lot of tombstones.  
> Are you ok with including those row tombstones as part of the thresholds ?  
> This was the original intent for creating this ticket, which was a  
> follow-up to CASSANDRA-8477  
> .  
>  
> For the follow up ticket, we may want to move the discussion in JIRA once  
> we've create the ticket, but a merge listener seems like the right way to  
> detect rows shadowed by range tombstones. That would force to change the  
> UnfilteredRowIterator interface to include the tombstones/rows/cells stats  
> as this is what is returned from the lower levels of the read path.  
> Is there any easier way to achieve this that you can think of, as that  
> interface is used in many parts of the code ?  
>  
> On Wed, Dec 27, 2017 at 1:35 PM Aleksey Yeshchenko   
> wrote:  
>  
>> As Jeff says, the number of actual tombstones is no less relevant than the  
>> number of  
>> cells and rows shadowed by them. And the way row and cell tombstones  
>> affect a read  
>> query can be very different from the way a big range tombstone might: you  
>> potentially  
>> have to retain every (regular) tombstone in memory and have replicas ship  
>> them to  
>> a coordinator, while you can discard everything shadowed by a big RT and  
>> only serialize  
>> a tiny bit of the data between the replicas and the coordinator.  
>>  
>> So a mixed metric that just mixes up rows and cells shadowed by all three  
>> kinds of tombstones  
>> without any differentiation, while better than not having that visibility  
>> at all, is worse than having  
>> a detailed rundown. If possible, I’d love to see proper separate counts  
>> tracked: range, row, and single-cell tombstones encountered, and # of rows  
>> or cells obsoleted by each type.  
>>  
>> I know that this is a non-trivial change, however, but hey, it doesn’t  
>> have to be a trivial patch if it’s going into 4.0.  
>>  
>> In the meantime I think it’d be helpful to report that single count. But I  
>> don’t like the idea of redefining what  
>> tombstone_warn_threshold and tombstone_failure_threshold mean, even in a  
>> major release, as RTs are  
>> qualitatively different from other tombstones, and have a much lower  
>> impact per dead row.  
>>  
>> —  
>> AY  
>>  
>> On 22 December 2017 at 03:53:47, kurt greaves (k...@instaclustr.com)  
>> wrote:  
>>  
>> I think that's a good idea for 4.x, but not so for current branches. I  
>> think as long as we document it well for 4.0 upgrades it's not so much of a  
>> problem. Obviously there will be cases of queries failing that were  
>> previously succeeding but we can already use  
>> tombstone_failure|warn_threshold to tune around that already. I don't think  
>> we need another yaml setting to enable/disable counting deleted rows for  
>> these thresholds, especially because it's going into 4.0. It *might* be a  
>> good idea to bump the tombstone failure threshold default as a safety net  
>> though (as well as put something in the NEWS.txt).  
>>  
>> On 21 December 2017 at 20:11, Jon Haddad  wrote:  
>>  
>>> I had suggested to Alex we kick this discussion over to the ML because  
>> the  
>>> change will have a significant impact on the behavior of Cassandra when  
>>> doing reads with range tombstones that cover a lot of rows. The behavior  
>>> now is a little weird, a single tombstone could shadow hundreds of  
>>> thousands or even millions of rows, and the query would probably just  
>> time  
>>> o

Re: CASSANDRA-8527

2018-01-05 Thread Alexander Dejanovski
Right, I think naming it "deleted rows" is misleading because it suggests
that 100 rows shadowed by one RT would be counted as 100 tombstones, which
is not the case here (those get merged away before getting counted).

The patch counts row tombstones, where (after merge) one tombstone shadows
a single row.
Renaming "deleted rows" to "tombstone rows" or something similar would make
more sense then?

Le ven. 5 janv. 2018 à 20:24, Aleksey Yeshchenko  a
écrit :

> Counting the number of range tombstones - sure, that is reasonable.
>
> Counting cells/rows shadowed by range tombstones toward the limit, as if
> they were regular tombstones - arguably not so much.
>
> —
> AY
>
> On 5 January 2018 at 18:53:45, Jon Haddad (j...@jonhaddad.com) wrote:
>
> I think it’s reasonable to count the number of range tombstones towards
> the total tombstone count / threshold.
>
> I agree the number of rows shadowed by the tombstones should be tracked
> separately, and we should probably think a bit more about how to add
> configuration / limits around this without burdening the user with a bunch
> of new flags they have to think about. I would prefer to avoid any more
> configuration settings as complex as back_pressure_strategy going forward.
>
> > On Jan 5, 2018, at 9:36 AM, Alexander Dejanovski 
> wrote:
> >
> > Hi Aleksey,
> >
> > ok we'll split the work and only deal with row level tombstones in
> > CASSANDRA-8527 
> and
> > create a follow up ticket to work on the separate counts you're
> suggesting.
> > My understanding of what you say is that you would not include range
> > tombstones in the warn/fail threshold, but row level tombstones have
> > somewhat an impact that is similar to cell tombstones. They will be
> > retained in memory and will be sent to replicas.
> > If we don't count them in the thresholds (at least for warnings), people
> > will miss the fact that they may be reading a lot of tombstones.
> > Are you ok with including those row tombstones as part of the thresholds
> ?
> > This was the original intent for creating this ticket, which was a
> > follow-up to CASSANDRA-8477
> > .
> >
> > For the follow up ticket, we may want to move the discussion in JIRA once
> > we've create the ticket, but a merge listener seems like the right way to
> > detect rows shadowed by range tombstones. That would force to change the
> > UnfilteredRowIterator interface to include the tombstones/rows/cells
> stats
> > as this is what is returned from the lower levels of the read path.
> > Is there any easier way to achieve this that you can think of, as that
> > interface is used in many parts of the code ?
> >
> > On Wed, Dec 27, 2017 at 1:35 PM Aleksey Yeshchenko 
> > wrote:
> >
> >> As Jeff says, the number of actual tombstones is no less relevant than
> the
> >> number of
> >> cells and rows shadowed by them. And the way row and cell tombstones
> >> affect a read
> >> query can be very different from the way a big range tombstone might:
> you
> >> potentially
> >> have to retain every (regular) tombstone in memory and have replicas
> ship
> >> them to
> >> a coordinator, while you can discard everything shadowed by a big RT and
> >> only serialize
> >> a tiny bit of the data between the replicas and the coordinator.
> >>
> >> So a mixed metric that just mixes up rows and cells shadowed by all
> three
> >> kinds of tombstones
> >> without any differentiation, while better than not having that
> visibility
> >> at all, is worse than having
> >> a detailed rundown. If possible, I’d love to see proper separate counts
> >> tracked: range, row, and single-cell tombstones encountered, and # of
> rows
> >> or cells obsoleted by each type.
> >>
> >> I know that this is a non-trivial change, however, but hey, it doesn’t
> >> have to be a trivial patch if it’s going into 4.0.
> >>
> >> In the meantime I think it’d be helpful to report that single count.
> But I
> >> don’t like the idea of redefining what
> >> tombstone_warn_threshold and tombstone_failure_threshold mean, even in a
> >> major release, as RTs are
> >> qualitatively different from other tombstones, and have a much lower
> >> impact per dead row.
> >>
> >> —
> >> AY
> >>
> >> On 22 December 2017 at 03:53:47, kurt greaves (k...@instaclustr.com)
> >> wrote:
> >>
> >> I think that's a good idea for 4.x, but not so for current branches. I
> >> think as long as we document it well for 4.0 upgrades it's not so much
> of a
> >> problem. Obviously there will be cases of queries failing that were
> >> previously succeeding but we can already use
> >> tombstone_failure|warn_threshold to tune around that already. I don't
> think
> >> we need another yaml setting to enable/disable counting deleted rows for
> >> these thresholds, especially because it's going into 4.0. It *might* be
> a
> >> good idea to bump the tombstone failure threshold default as a safety
> net
> >> though (

Question on submitting a patch

2018-01-05 Thread Tyagi, Preetika
Hi all,

When I click on "Submit Patch" option, it pops up a new screen where it asks 
for a bunch of details including Fix Version(s). Does the patch need to be 
synced up with the latest repo or I can just choose which version I worked with 
(which may not necessarily be the latest and hence one would be need to fetch 
that specific repo version in order to compile source with the patch)?

Also, there is no option to upload the patch file on this screen. Can someone 
point out where to actually upload the patch? I haven't done before so might be 
asking dumb questions! :)

Thanks,
Preetika




Re: Question on submitting a patch

2018-01-05 Thread Michael Kjellman
great question! i personally always just leave everything blank to effectively 
just get the stupid state changed reason number 3000 why I personally hate 
JIRA.

> On Jan 5, 2018, at 12:44 PM, Tyagi, Preetika  wrote:
> 
> Hi all,
> 
> When I click on "Submit Patch" option, it pops up a new screen where it asks 
> for a bunch of details including Fix Version(s). Does the patch need to be 
> synced up with the latest repo or I can just choose which version I worked 
> with (which may not necessarily be the latest and hence one would be need to 
> fetch that specific repo version in order to compile source with the patch)?
> 
> Also, there is no option to upload the patch file on this screen. Can someone 
> point out where to actually upload the patch? I haven't done before so might 
> be asking dumb questions! :)
> 
> Thanks,
> Preetika
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Question on submitting a patch

2018-01-05 Thread Jeff Jirsa
On Fri, Jan 5, 2018 at 12:44 PM, Tyagi, Preetika 
wrote:

> Hi all,
>
> When I click on "Submit Patch" option, it pops up a new screen where it
> asks for a bunch of details including Fix Version(s).


IF you know it, the 'Fix Version' should be set to the branches the bug
impacts.

For example, a bug that is in 3.11.0 that impacts trunk but doesn't exist
in 3.0.15, you'd set 3.11.x and 4.x, that way whoever goes to merge/commit
it for you knows which branches it impacts.


> Does the patch need to be synced up with the latest repo or I can just
> choose which version I worked with (which may not necessarily be the latest
> and hence one would be need to fetch that specific repo version in order to
> compile source with the patch)?
>

Any patches are better than no patches, but obviously patches for all
impacted branches are better than a patch just for the branch you use.


>
> Also, there is no option to upload the patch file on this screen. Can
> someone point out where to actually upload the patch? I haven't done before
> so might be asking dumb questions! :)
>

Paste a link to github or similar if you need, but usually "More" ->
"Attach Files" to upload a patch, and then "Submit Patch" is the workflow
to notify everyone that it's patch-available.