secondary index table - tombstones surviving compactions

2018-05-18 Thread Roman Bielik
Hi,

I have a Cassandra 3.11 table (with compact storage) and using secondary
indices with rather unique data stored in the indexed columns. There are
many inserts and deletes, so in order to avoid tombstones piling up I'm
re-using primary keys from a pool (which works fine).
I'm aware that this design pattern is not ideal, but for now I can not
change it easily.

The problem is, the size of 2nd index tables keeps growing (filled with
tombstones) no matter what.

I tried some aggressive configuration (just for testing) in order to
expedite the tombstone removal but with little-to-zero effect:
COMPACTION = { 'class':
'LeveledCompactionStrategy', 'unchecked_tombstone_compaction': 'true',
'tombstone_compaction_interval': 600 }
gc_grace_seconds = 600

I'm aware that perhaps Materialized views could provide a solution to this,
but I'm bind to the Thrift interface, so can not use them.

Questions:
1. Is there something I'm missing? How come compaction does not remove the
obsolete indices/tombstones from 2nd index tables? Can I trigger the
cleanup manually somehow?
I have tried nodetool flush, compact, rebuild_index on both data table and
internal Index table, but with no result.

2. When deleting a record I'm deleting the whole row at once - which would
create one tombstone for the whole record if I'm correct. Would it help to
delete the indexed columns separately creating extra tombstone for each
cell?
As I understand the underlying mechanism, the indexed column value must be
read in order a proper tombstone for the index is created for it.

3. Could the fact that I'm reusing the primary key of a deleted record
shortly for a new insert interact with the secondary index tombstone
removal?

Will be grateful for any advice.

Regards,
Roman

-- 
 
  
  
  


Re: secondary index table - tombstones surviving compactions

2018-05-23 Thread Roman Bielik
Hi,

I apologise for a late response I wanted to run some further tests so I can
provide more information to you.

@Jeff, no I don't set the "only_purge_repaired_tombstone" option. It should
be default: False.
But no I don't run repairs during the tests.

@Eric, I understand that rapid deletes/inserts are some kind of
antipattern, nevertheless I'm not experiencing any problems with that
(except for the 2nd indices).

Update: I run a new test where I delete the indexed columns extra, plus
delete the whole row at the end.
And surprisingly this test scenario works fine. Using nodetool flush +
compact (in order to expedite the test) seems to always purge the index
table.
So that's great because I seem to have found a workaround, on the other
hand, could there be a bug in Cassandra - leaking index table?

Test details:
Create table with LeveledCompactionStrategy;
'tombstone_compaction_interval': 60; gc_grace_seconds=60
There are two indexed columns for comparison: column1, column2
Insert keys {1..x} with random values in column1 & column2
Delete {key:column2} (but not column1)
Delete {key}
Repeat n-times from the inserts
Wait 1 minute
nodetool flush
nodetool compact (sometimes compact  
nodetool cfstats

What I observe is, that the data table is empty, column2 index table is
also empty and column1 index table has non-zero (leaked) "space used" and
"estimated rows".

Roman






On 18 May 2018 at 16:13, Jeff Jirsa  wrote:

> This would matter for the base table, but would be less likely for the
> secondary index, where the partition key is the value of the base row
>
> Roman: there’s a config option related to only purging repaired tombstones
> - do you have that enabled ? If so, are you running repairs?
>
> --
> Jeff Jirsa
>
>
> > On May 18, 2018, at 6:41 AM, Eric Stevens  wrote:
> >
> > The answer to Question 3 is "yes."  One of the more subtle points about
> > tombstones is that Cassandra won't remove them during compaction if there
> > is a bloom filter on any SSTable on that replica indicating that it
> > contains the same partition (not primary) key.  Even if it is older than
> > gc_grace, and would otherwise be a candidate for cleanup.
> >
> > If you're recycling partition keys, your tombstones may never be able to
> be
> > cleaned up, because in this scenario there is a high probability that an
> > SSTable not involved in that compaction also contains the same partition
> > key, and so compaction cannot have confidence that it's safe to remove
> the
> > tombstone (it would have to fully materialize every record in the
> > compaction, which is too expensive).
> >
> > In general it is an antipattern in Cassandra to write to a given
> partition
> > indefinitely for this and other reasons.
> >
> > On Fri, May 18, 2018 at 2:37 AM Roman Bielik <
> > roman.bie...@openmindnetworks.com> wrote:
> >
> >> Hi,
> >>
> >> I have a Cassandra 3.11 table (with compact storage) and using secondary
> >> indices with rather unique data stored in the indexed columns. There are
> >> many inserts and deletes, so in order to avoid tombstones piling up I'm
> >> re-using primary keys from a pool (which works fine).
> >> I'm aware that this design pattern is not ideal, but for now I can not
> >> change it easily.
> >>
> >> The problem is, the size of 2nd index tables keeps growing (filled with
> >> tombstones) no matter what.
> >>
> >> I tried some aggressive configuration (just for testing) in order to
> >> expedite the tombstone removal but with little-to-zero effect:
> >> COMPACTION = { 'class':
> >> 'LeveledCompactionStrategy', 'unchecked_tombstone_compaction': 'true',
> >> 'tombstone_compaction_interval': 600 }
> >> gc_grace_seconds = 600
> >>
> >> I'm aware that perhaps Materialized views could provide a solution to
> this,
> >> but I'm bind to the Thrift interface, so can not use them.
> >>
> >> Questions:
> >> 1. Is there something I'm missing? How come compaction does not remove
> the
> >> obsolete indices/tombstones from 2nd index tables? Can I trigger the
> >> cleanup manually somehow?
> >> I have tried nodetool flush, compact, rebuild_index on both data table
> and
> >> internal Index table, but with no result.
> >>
> >> 2. When deleting a record I'm deleting the whole row at once - which
> would
> >> create one tombstone for the whole record if I'm correct. Would it help
> to
> 

Re: secondary index table - tombstones surviving compactions

2018-05-31 Thread Roman Bielik
Hi Jordan,

thank you for accepting this as an issue.
I will follow the ticket.

Best regards,
Roman


On 30 May 2018 at 11:40, Jordan West  wrote:

> Hi Roman,
>
> I was able to reproduce the issue you described. I filed
> https://issues.apache.org/jira/browse/CASSANDRA-14479. More details there.
>
> Thanks for reporting!
> Jordan
>
>
> On Wed, May 23, 2018 at 12:06 AM, Roman Bielik <
> roman.bie...@openmindnetworks.com> wrote:
>
> > Hi,
> >
> > I apologise for a late response I wanted to run some further tests so I
> can
> > provide more information to you.
> >
> > @Jeff, no I don't set the "only_purge_repaired_tombstone" option. It
> > should
> > be default: False.
> > But no I don't run repairs during the tests.
> >
> > @Eric, I understand that rapid deletes/inserts are some kind of
> > antipattern, nevertheless I'm not experiencing any problems with that
> > (except for the 2nd indices).
> >
> > Update: I run a new test where I delete the indexed columns extra, plus
> > delete the whole row at the end.
> > And surprisingly this test scenario works fine. Using nodetool flush +
> > compact (in order to expedite the test) seems to always purge the index
> > table.
> > So that's great because I seem to have found a workaround, on the other
> > hand, could there be a bug in Cassandra - leaking index table?
> >
> > Test details:
> > Create table with LeveledCompactionStrategy;
> > 'tombstone_compaction_interval': 60; gc_grace_seconds=60
> > There are two indexed columns for comparison: column1, column2
> > Insert keys {1..x} with random values in column1 & column2
> > Delete {key:column2} (but not column1)
> > Delete {key}
> > Repeat n-times from the inserts
> > Wait 1 minute
> > nodetool flush
> > nodetool compact (sometimes compact  
> > nodetool cfstats
> >
> > What I observe is, that the data table is empty, column2 index table is
> > also empty and column1 index table has non-zero (leaked) "space used" and
> > "estimated rows".
> >
> > Roman
> >
> >
> >
> >
> >
> >
> > On 18 May 2018 at 16:13, Jeff Jirsa  wrote:
> >
> > > This would matter for the base table, but would be less likely for the
> > > secondary index, where the partition key is the value of the base row
> > >
> > > Roman: there’s a config option related to only purging repaired
> > tombstones
> > > - do you have that enabled ? If so, are you running repairs?
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On May 18, 2018, at 6:41 AM, Eric Stevens  wrote:
> > > >
> > > > The answer to Question 3 is "yes."  One of the more subtle points
> about
> > > > tombstones is that Cassandra won't remove them during compaction if
> > there
> > > > is a bloom filter on any SSTable on that replica indicating that it
> > > > contains the same partition (not primary) key.  Even if it is older
> > than
> > > > gc_grace, and would otherwise be a candidate for cleanup.
> > > >
> > > > If you're recycling partition keys, your tombstones may never be able
> > to
> > > be
> > > > cleaned up, because in this scenario there is a high probability that
> > an
> > > > SSTable not involved in that compaction also contains the same
> > partition
> > > > key, and so compaction cannot have confidence that it's safe to
> remove
> > > the
> > > > tombstone (it would have to fully materialize every record in the
> > > > compaction, which is too expensive).
> > > >
> > > > In general it is an antipattern in Cassandra to write to a given
> > > partition
> > > > indefinitely for this and other reasons.
> > > >
> > > > On Fri, May 18, 2018 at 2:37 AM Roman Bielik <
> > > > roman.bie...@openmindnetworks.com> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I have a Cassandra 3.11 table (with compact storage) and using
> > secondary
> > > >> indices with rather unique data stored in the indexed columns. There
> > are
> > > >> many inserts and deletes, so in order to avoid tombstones piling up
> > I'm
> > > >> re-using primary keys from a pool (which works fine).
> > > >> I'm aware that this design pattern is not ideal, 

Create table with ID - Question

2016-09-28 Thread Roman Bielik
Hi,

in CQL it is possible to create a table with explicit ID: CREATE TABLE ...
WITH ID='xyz'.

Is something like this possible via Thrift interface?
There is an int32 "id" field in CfDef, but it has no effect on the table ID.

My problem is, that concurrent create table (add_column_family) requests
for the same table name result in clash with somewhat unpredictable
behavior.

This problem was reported in:
https://issues.apache.org/jira/browse/CASSANDRA-9933

and seems to be related to changes from ticket:
https://issues.apache.org/jira/browse/CASSANDRA-5202

A workaround for me could be using the same ID in create table, however I'm
using Thrift interface only.

Thank you.
Regards,
Roman

--