Re: State of triggers

benjamin roth Sun, 05 Mar 2017 00:23:51 -0800

Sorry. Answer was to fast. Maybe you are right.

Am 05.03.2017 09:21 schrieb "benjamin roth" <brs...@gmail.com>:


> No. You just change the partitioner. That's all
>
> Am 05.03.2017 09:15 schrieb "DuyHai Doan" <doanduy...@gmail.com>:
>
>> "How can that be achieved? I haven't done "scientific researches" yet but
>> I
>> guess a "MV partitioner" could do the trick. Instead of applying the
>> regular partitioner, an MV partitioner would calculate the PK of the base
>> table (which is always possible) and then apply the regular partitioner."
>>
>> The main purpose of MV is to avoid the drawbacks of 2nd index
>> architecture,
>> e.g. to scan a lot of nodes to fetch the results.
>>
>> With MV, since you give the partition key, the guarantee is that you'll
>> hit
>> a single node.
>>
>> Now if you put MV data on the same node as base table data, you're doing
>> more-or-less the same thing as 2nd index.
>>
>> Let's take a dead simple example
>>
>> CREATE TABLE user (user_id uuid PRIMARY KEY, email text);
>> CREATE MV user_by_email AS SELECT * FROM user WHERE user_id IS NOT NULL
>> AND
>> email IS NOT NULL PRIMARY KEY((email),user_id);
>>
>> SELECT * FROM user_by_email WHERE email = xxx;
>>
>> With this query, how can you find the user_id that corresponds to email
>> 'xxx' so that your MV partitioner idea can work ?
>>
>>
>>
>> On Sun, Mar 5, 2017 at 9:05 AM, benjamin roth <brs...@gmail.com> wrote:
>>
>> > While I was reading the MV paragraph in your post, an idea popped up:
>> >
>> > The problem with MV inconsistencies and inconsistent range movement is
>> that
>> > the "MV contract" is broken. This only happens because base data and
>> > replica data reside on different hosts. If base data + replicas would
>> stay
>> > on the same host then a rebuild/remove would always stream both matching
>> > parts of a base table + mv.
>> >
>> > So my idea:
>> > Why not make a replica ALWAYS stay local regardless where the token of
>> a MV
>> > would point at. That would solve these problems:
>> > 1. Rebuild / remove node would not break MV contract
>> > 2. A write always stays local:
>> >
>> > a) That means replication happens sync. That means a quorum write to the
>> > base table guarantees instant data availability with quorum read on a
>> view
>> >
>> > b) It saves network roundtrips + request/response handling and helps to
>> > keep a cluster healthier in case of bulk operations (like repair
>> streams or
>> > rebuild stream). Write load stays local and is not spread across the
>> whole
>> > cluster. I think it makes the load in these situations more predictable.
>> >
>> > How can that be achieved? I haven't done "scientific researches" yet
>> but I
>> > guess a "MV partitioner" could do the trick. Instead of applying the
>> > regular partitioner, an MV partitioner would calculate the PK of the
>> base
>> > table (which is always possible) and then apply the regular partitioner.
>> >
>> > I'll create a proper Jira for it on monday. Currently it's sunday here
>> and
>> > my family wants me back so just a few thoughts on this right now.
>> >
>> > Any feedback is appreciated!
>> >
>> > 2017-03-05 6:34 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:
>> >
>> > > On Sat, Mar 4, 2017 at 10:26 AM, Jeff Jirsa <jji...@gmail.com> wrote:
>> > >
>> > > >
>> > > >
>> > > >
>> > > > > On Mar 4, 2017, at 7:06 AM, Edward Capriolo <
>> edlinuxg...@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > >> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jji...@gmail.com>
>> > wrote:
>> > > > >>
>> > > > >> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo <
>> > > edlinuxg...@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >>>
>> > > > >>> I used them. I built do it yourself secondary indexes with them.
>> > They
>> > > > >> have
>> > > > >>> there gotchas, but so do all the secondary index
>> implementations.
>> > > Just
>> > > > >>> because datastax does not write about something. Lets see like 5
>> > > years
>> > > > >> ago
>> > > > >>> there was this: https://github.com/hmsonline/cassandra-triggers
>> > > > >>>
>> > > > >>>
>> > > > >> Still in use? How'd it work? Production ready? Would you still
>> do it
>> > > > that
>> > > > >> way in 2017?
>> > > > >>
>> > > > >>
>> > > > >>> There is a fairly large divergence to what actual users do and
>> what
>> > > > other
>> > > > >>> groups 'say' actual users do in some cases.
>> > > > >>>
>> > > > >>
>> > > > >> A lot of people don't share what they're doing (for business
>> > reasons,
>> > > or
>> > > > >> because they don't think it's important, or because they don't
>> know
>> > > > >> how/where), and that's fine but it makes it hard for anyone to
>> know
>> > > what
>> > > > >> features are used, or how well they're really working in
>> production.
>> > > > >>
>> > > > >> I've seen a handful of "how do we use triggers" questions in IRC,
>> > and
>> > > > they
>> > > > >> weren't unreasonable questions, but seemed like a lot of pain,
>> and
>> > > more
>> > > > >> than one of those people ultimately came back and said they used
>> > some
>> > > > other
>> > > > >> mechanism (and of course, some of them silently disappear, so we
>> > have
>> > > no
>> > > > >> idea if it worked or not).
>> > > > >>
>> > > > >> If anyone's actively using triggers, please don't keep it a
>> secret.
>> > > > Knowing
>> > > > >> that they're being used would be a great way to justify
>> continuing
>> > to
>> > > > >> maintain them.
>> > > > >>
>> > > > >> - Jeff
>> > > > >>
>> > > > >
>> > > > > "Still in use? How'd it work? Production ready? Would you still
>> do it
>> > > > that way in 2017?"
>> > > > >
>> > > > > I mean that is a loaded question. How long has cassandra had
>> > Secondary
>> > > > > Indexes? Did they work well? Would you use them? How many times
>> were
>> > > > they re-written?
>> > > >
>> > > > It wasn't really meant to be a loaded question; I was being sincere
>> > > >
>> > > > But I'll answer: secondary indexes suck for many use cases, but
>> they're
>> > > > invaluable for their actual intended purpose, and I have no idea how
>> > many
>> > > > times they've been rewritten but they're production ready for their
>> > > narrow
>> > > > use case (defined by cardinality).
>> > > >
>> > > > Is there a real triggers use case still? Alternative to MVs?
>> > Alternative
>> > > > to CDC? I've never implemented triggers - since you have, what's the
>> > > level
>> > > > of surprise for the developer?
>> > >
>> > >
>> > > :) You mention alternatives/: Lets break them down.
>> > >
>> > > MV:
>> > > They seem to have a lot pf promise. IE you can use them for things
>> other
>> > > then equality searches, and I do think the CQL example with the top N
>> > high
>> > > scores is pretty useful. Then again our buddy Mr Roth has a thread
>> named
>> > > "Rebuild / remove node with MV is inconsistent". I actually think a
>> lot
>> > of
>> > > the use case for mv falls into the category of "something you should
>> > > actually be doing with storm". I can vibe with the concept of not
>> > needing a
>> > > streaming platform, but i KNOW storm would do this correctly. I don't
>> > want
>> > > to land on something like 2x index v1 v2 where there was fundamental
>> > flaws
>> > > at scale.(not saying this is case but the rebuild thing seems a bit
>> > scary)
>> > >
>> > > CDC:
>> > > I slightly afraid of this. Rational: A extensible piece design
>> > specifically
>> > > for a close source implementation of hub and spoke replication. I have
>> > some
>> > > experience trying to "play along" with extensible things
>> > > https://issues.apache.org/jira/browse/CASSANDRA-12627
>> > > "Thus, I'm -1 on {[PropertyOrEnvironmentSeedProvider}}."
>> > >
>> > > Not a rub, but I can't even get something committed using an existing
>> > > extensible interface. Heaven forbid a use case I have would want to
>> > > *change*
>> > > the interface, I would probably get a -12. So I have no desire to try
>> and
>> > > maintain a CDC implementation. I see myself falling into the same old
>> > "why
>> > > you want to do this? -1" trap.
>> > >
>> > > Coordinator Triggers:
>> > > To bring things back really old-school coordinator triggers everyone
>> > always
>> > > wanted. In a nutshell, I DO believe they are easier to reason about
>> then
>> > > MV. It is pretty basic, it happens on the coordinator there is no
>> > batchlogs
>> > > or whatever, best effort possibly requiring more nodes then as the
>> keys
>> > > might be on different services. Actually I tend do like features like.
>> > Once
>> > > something comes on the downswing of  "software hype cycle" you know
>> it is
>> > > pretty stable as everyone's all excited about other things.
>> > >
>> > > As I said, I know I can use storm for top-n, so what is this feature?
>> > Well
>> > > I want to optimize my network transfer generally by building my batch
>> > > mutations on the server. Seems reasonable. Maybe I want to have my own
>> > > little "read before write" thing like CQL lists.
>> > >
>> > > The warts, having tried it. First time i tried it found it did not
>> work
>> > > with non batches, patched in 3 hours. Took weeks before some CQL user
>> had
>> > > the same problem and it got fixed :) There was no dynamic stuff at the
>> > time
>> > > so it was BYO class loader. Going against the grain and saying.
>> > >
>> > > The thing you have to realize with the best effort coordinator
>> triggers
>> > are
>> > > that "transaction" could be incomplete and well that sucks maybe for
>> some
>> > > cases. But I actually felt the 2x index implementations force all
>> > problems
>> > > into a type of "foreign key transnational integrity " that does not
>> make
>> > > sense for cassandra.
>> > >
>> > > Have you every used elastic search, there version of consistency is
>> write
>> > > something, keep reading and eventually you see it, wildly popular :)
>> It
>> > is
>> > > a crazy world.
>> > >
>> >
>>
>

Re: State of triggers

Reply via email to