Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Jonathan Haddad
> I’m still not sure if having tombstones vs. empty values / frozen UDTs will have the same results. When in doubt, benchmark. Good luck, Jon On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos wrote: > Loosing atomic updates is a good point, but in my use case its not a > problem, since I always ov

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Tomas Bartalos
Loosing atomic updates is a good point, but in my use case its not a problem, since I always overwrite the whole record (no partitial updates). I’m still not sure if having tombstones vs. empty values / frozen UDTs will have the same results. When I update one row with 10 null columns it will cr

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
The idea of storing your data as a single blob can be dangerous. Indeed, you loose the ability to perform atomic update on each column. In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same row, 1st update changes column Firstname (let's say it's a Person record) and 2nd update

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
Those are two different cases though. It *sounds like* (again, I may be missing the point) you're trying to overwrite a value with another value. You're either going to serialize a blob and overwrite a single cell, or you're going to overwrite all the cells and include a tombstone. When you do a

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos
Hello Jon, I thought having tombstones is much higher overhead than just overwriting values. The compaction overhead can be l similar, but I think the read performance is much worse. Tombstones accumulate and hang for 10 days (by default) before they are eligible for compaction. Also we have tom

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
If you're overwriting values, it really doesn't matter much if it's a tombstone or any other value, they still need to be compacted and have the same overhead at read time. Tombstones are problematic when you try to use Cassandra as a queue (or something like a queue) and you need to scan over tho

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos
Hello, I beleive your approach is the same as using spark with " spark.cassandra.output.ignoreNulls=true" This will not cover the situation when a value have to be overwriten with null. I found one possible solution - change the schema to keep only primary key fields and move all other fields to

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
"The problem is I can't know the combination of set/unset values" --> Just for this requirement, Achilles has a working solution for many years using INSERT_NOT_NULL_FIELDS strategy: https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy Or you can use the Update API that by design only perf

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Tomas Bartalos
Hello, The problem is I can't know the combination of set/unset values. From my perspective every value should be set. The event from Kafka represents the complete state of the happening at certain point in time. In my table I want to store the latest event so the most recent state of the happenin

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Eric Stevens
Depending on the use case, creating separate prepared statements for each combination of set / unset values in large INSERT/UPDATE statements may be prohibitive. Instead, you can look into driver level support for UNSET values. Requires Cassandra 2.2 or later IIRC. See: Java Driver: https://docs

RE: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Durity, Sean R
You say the events are incremental updates. I am interpreting this to mean only some columns are updated. Others should keep their original values. You are correct that inserting null creates a tombstone. Can you only insert the columns that actually have new values? Just skip the columns with