Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library
to Benedict: well ... I was not around when the decision about the usage of Chronicle Queues was made. I think that at that time it was the most obvious candidate without reinventing the wheel given the features and capabilities it had so taking something off the shelf was a natural conclusion. Josh / Jordan: not only FQL but Audit as well these are two separate things. There is also quite a "rich" ecosystem around that. 1) nodetool commands like enableauditlog enablefullquerylog disableauditlog disablefullquerylog getauditlog getfullquerylog Also, because the files it produces are binary, we need a special tooling to inspect it, it is in tools/fqltool with a bunch of classes, and there is also an AuditLogViewer for reviewing audit logs. There are MBean methods enabling nodetool commands. We have also shipped that in two major releases (4.0 and now in 5.0) so the community is quite well used to this, they have the processes set around this etc. I mention this all because it is just not so easy to replace it with something else if somebody wanted that, in any case. How do we even go around deprecating this if we are indeed going to replace that? To discuss the release aspect they have in place: I think you are right that the latest ea is as close as possible, if not the same, as what they release privately. Yes. But if we want to stick to the rule that we upgrade only to the latest ea relese before their next minor, then 1) we will be always at least one minor late 2) we do not know when they make up their minds to transition to a new minor so we can upgrade to the latest ea one minor before 3) if something is broken and we need to fix it and we are on ea, then what we get to update to is the latest ea at that time which might fix the issue but it will also bring new stuff in which might open doors to instability as well. So we update to fix the bugs but we might include new ones unknowingly. Anyway, I don't think this has any silver bullet solution, we might just stick to the latest "ea" and be done with it. I do not expect this project to evolve wildly and unpredictably, it just solves "one problem", there is basically nothing new coming in. Brandon: I understand your concerns about phoning home but 1) we already resolved this by setting the respective property 2) I do not think that Chronicle will mess with this once they introduce that. There is nothing to "improve" or "change" there. It is phoning home or not and it is driven by one property. If they made a change that we can not turn it off then we would really be in trouble but for now we are not and practically speaking I don't expect this would change. I know that this might sound like wishful thinking but in practical terms I really just don't expect this phoning home thing would come back ever. Speaking of alternatives, I think the primary reason Chronicle was used is this (1). "It's goal is good enough performance, predictable footprint, simplicity in terms of implementation and configuration and most importantly minimal impact on producers of log records." While I understand English (I guess, well enough :D), I just don't understand what "good enough performance" is. How is this measured? What is a "predictable footprint"? Was that measured too? How did we quantify that? " Performance safety is accomplished by feeding items to the binary log using a weighted queue and dropping records if the binary log falls sufficiently far behind." This is interesting, if I understand correctly, the messages are weighted and the heavier they are, the more probable it is they will be dropped when it is overloaded? Or vice versa, the tighter ones are dropped first? Have we _ever_ experienced in production that some log events were really dropped? Has anybody ever hit that? When it comes to alternatives, what about logback + slf4j? It has appenders where we want, it is sync / async, we can code some nio appender too I guess, it logs it as text into a file so we do not need any special tooling to review that. For tailing which Chronicle also offers, I guess "tail -f that.log" just does the job? logback even rolls the files after they are big enough so it rolls the files the same way after some configured period / size as Chronicle does (It even compresses the logs). Do we log so much so that battle-tested logback is just absolutely not enough for us? Come on, this is not a rocket science that we need to use a library from the realm of "high frequency trading" to just append queries and audit logs as they are executed. logback can handle the load we have just fine imo ... Or maybe I am completely wrong and we just HAVE TO use Chronicle? (1) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/binlog/BinLog.java#L58-L69 On Tue, Sep 17, 2024 at 3:12 AM Brandon Williams wrote: > My concern is that we have to keep making sure it's not phoning home(1,2). > > (1) https://issues.apache.org/jira/browse/CASSANDRA-
Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library
My point is only that AFAICT we use it for something incredibly basic that we do all the time elsewhere without it.I’m not proposing we remove it, I don’t have a position on that. But if we don’t _trust_ ourselves to replace it we should get out of the database game.The fact it would break compatibility between releases is suboptimal, but IMO not at all a dealbreaker because these files are not required to be compatible between versions - they’re offline logs and I think it would be fine to require different viewers for files produced by different versions of Cassandra.I do not think any of the nodetool methods would be affected by this, as they do not appear to touch the contents of the log files.On 17 Sep 2024, at 09:28, Štefan Miklošovič wrote:to Benedict:well ... I was not around when the decision about the usage of Chronicle Queues was made. I think that at that time it was the most obvious candidate without reinventing the wheel given the features and capabilities it had so taking something off the shelf was a natural conclusion.Josh / Jordan:not only FQL but Audit as well these are two separate things. There is also quite a "rich" ecosystem around that.1) nodetool commands likeenableauditlogenablefullquerylogdisableauditlogdisablefullqueryloggetauditloggetfullquerylogAlso, because the files it produces are binary, we need a special tooling to inspect it, it is in tools/fqltool with a bunch of classes, and there is also an AuditLogViewer for reviewing audit logs.There are MBean methods enabling nodetool commands.We have also shipped that in two major releases (4.0 and now in 5.0) so the community is quite well used to this, they have the processes set around this etc.I mention this all because it is just not so easy to replace it with something else if somebody wanted that, in any case. How do we even go around deprecating this if we are indeed going to replace that?To discuss the release aspect they have in place: I think you are right that the latest ea is as close as possible, if not the same, as what they release privately. Yes. But if we want to stick to the rule that we upgrade only to the latest ea relese before their next minor, then 1) we will be always at least one minor late2) we do not know when they make up their minds to transition to a new minor so we can upgrade to the latest ea one minor before 3) if something is broken and we need to fix it and we are on ea, then what we get to update to is the latest ea at that time which might fix the issue but it will also bring new stuff in which might open doors to instability as well. So we update to fix the bugs but we might include new ones unknowingly.Anyway, I don't think this has any silver bullet solution, we might just stick to the latest "ea" and be done with it. I do not expect this project to evolve wildly and unpredictably, it just solves "one problem", there is basically nothing new coming in.Brandon:I understand your concerns about phoning home but 1) we already resolved this by setting the respective property2) I do not think that Chronicle will mess with this once they introduce that. There is nothing to "improve" or "change" there. It is phoning home or not and it is driven by one property. If they made a change that we can not turn it off then we would really be in trouble but for now we are not and practically speaking I don't expect this would change. I know that this might sound like wishful thinking but in practical terms I really just don't expect this phoning home thing would come back ever.Speaking of alternatives, I think the primary reason Chronicle was used is this (1)."It's goal is good enough performance, predictable footprint, simplicity in terms of implementation and configuration and most importantly minimal impact on producers of log records."While I understand English (I guess, well enough :D), I just don't understand what "good enough performance" is. How is this measured? What is a "predictable footprint"? Was that measured too? How did we quantify that? " Performance safety is accomplished by feeding items to the binary log using a weighted queue and dropping records if the binary log falls sufficiently far behind."This is interesting, if I understand correctly, the messages are weighted and the heavier they are, the more probable it is they will be dropped when it is overloaded? Or vice versa, the tighter ones are dropped first?Have we _ever_ experienced in production that some log events were really dropped? Has anybody ever hit that?When it comes to alternatives, what about logback + slf4j? It has appenders where we want, it is sync / async, we can code some nio appender too I guess, it logs it as text into a file so we do not need any special tooling to review that. For tailing which Chronicle also offers, I guess "tail -f that.log" just does the job? logback even rolls the files after they are big enough so it rolls the files the same way after some configured period / size as Chronicle does (It even
Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library
There are configuration properties related to controlling what that bin log does in runtime so if we completely changed the vehicle it operates on then the only thing which would stay in common is the name of the command and the logical operation it does (enable / disable, get the config if there is any) ... If we ever make another solution happen, I think it would be better if we just kept the old stuff in and developed something parallel and when it is stable enough we would ditch the old solution. BTW I have one technical question here, not directed to Benedict as I reply him but to the broader audience out there: If this in the javadocs is true as I linked that above already: "Performance safety is accomplished by feeding items to the binary log using a weighted queue and dropping records if the binary log falls sufficiently far behind." then how is it possible that FQL works? If there is a chance to drop some events, hence we dropped the actual query which was executed, then when we replay the logs (FQL framework can replay the logs against an empty database), then there is no guarantee that we actually get the same state of the database after it is replayed? So FQL is in this sense "the best effort" kind of tooling? On Tue, Sep 17, 2024 at 10:37 AM Benedict wrote: > My point is only that AFAICT we use it for something incredibly basic that > we do all the time elsewhere without it. > > I’m not proposing we remove it, I don’t have a position on that. But if we > don’t _trust_ ourselves to replace it we should get out of the database > game. > > The fact it would break compatibility between releases is suboptimal, but > IMO not at all a dealbreaker because these files are not required to be > compatible between versions - they’re offline logs and I think it would be > fine to require different viewers for files produced by different versions > of Cassandra. > > I do not think any of the nodetool methods would be affected by this, as > they do not appear to touch the contents of the log files. > > On 17 Sep 2024, at 09:28, Štefan Miklošovič > wrote: > > > to Benedict: > > well ... I was not around when the decision about the usage of Chronicle > Queues was made. I think that at that time it was the most obvious > candidate without reinventing the wheel given the features and capabilities > it had so taking something off the shelf was a natural conclusion. > > Josh / Jordan: > > not only FQL but Audit as well these are two separate things. There is > also quite a "rich" ecosystem around that. > > 1) nodetool commands like > > enableauditlog > enablefullquerylog > disableauditlog > disablefullquerylog > getauditlog > getfullquerylog > > Also, because the files it produces are binary, we need a special tooling > to inspect it, it is in tools/fqltool with a bunch of classes, and there is > also an AuditLogViewer for reviewing audit logs. > > There are MBean methods enabling nodetool commands. > > We have also shipped that in two major releases (4.0 and now in 5.0) so > the community is quite well used to this, they have the processes set > around this etc. > > I mention this all because it is just not so easy to replace it with > something else if somebody wanted that, in any case. How do we even go > around deprecating this if we are indeed going to replace that? > > To discuss the release aspect they have in place: I think you are right > that the latest ea is as close as possible, if not the same, as what they > release privately. Yes. But if we want to stick to the rule that we upgrade > only to the latest ea relese before their next minor, then > > 1) we will be always at least one minor late > 2) we do not know when they make up their minds to transition to a new > minor so we can upgrade to the latest ea one minor before > 3) if something is broken and we need to fix it and we are on ea, then > what we get to update to is the latest ea at that time which might fix the > issue but it will also bring new stuff in which might open doors to > instability as well. So we update to fix the bugs but we might include new > ones unknowingly. > > Anyway, I don't think this has any silver bullet solution, we might just > stick to the latest "ea" and be done with it. I do not expect this project > to evolve wildly and unpredictably, it just solves "one problem", there is > basically nothing new coming in. > > Brandon: > > I understand your concerns about phoning home but > > 1) we already resolved this by setting the respective property > 2) I do not think that Chronicle will mess with this once they introduce > that. There is nothing to "improve" or "change" there. It is phoning home > or not and it is driven by one property. If they made a change that we can > not turn it off then we would really be in trouble but for now we are not > and practically speaking I don't expect this would change. > > I know that this might sound like wishful thinking but in practical terms > I really just don't expect th
Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library
All of these options are managed by us, the only property that is passed through to chronicle is the “RollCycle” that we can trivially replicate, or that we could simply deprecate.On 17 Sep 2024, at 09:57, Štefan Miklošovič wrote:There are configuration properties related to controlling what that bin log does in runtime so if we completely changed the vehicle it operates on then the only thing which would stay in common is the name of the command and the logical operation it does (enable / disable, get the config if there is any) ...If we ever make another solution happen, I think it would be better if we just kept the old stuff in and developed something parallel and when it is stable enough we would ditch the old solution.BTW I have one technical question here, not directed to Benedict as I reply him but to the broader audience out there:If this in the javadocs is true as I linked that above already:"Performance safety is accomplished by feeding items to the binary log using a weighted queue and dropping records if the binary log falls sufficiently far behind."then how is it possible that FQL works? If there is a chance to drop some events, hence we dropped the actual query which was executed, then when we replay the logs (FQL framework can replay the logs against an empty database), then there is no guarantee that we actually get the same state of the database after it is replayed? So FQL is in this sense "the best effort" kind of tooling? On Tue, Sep 17, 2024 at 10:37 AM Benedictwrote:My point is only that AFAICT we use it for something incredibly basic that we do all the time elsewhere without it.I’m not proposing we remove it, I don’t have a position on that. But if we don’t _trust_ ourselves to replace it we should get out of the database game.The fact it would break compatibility between releases is suboptimal, but IMO not at all a dealbreaker because these files are not required to be compatible between versions - they’re offline logs and I think it would be fine to require different viewers for files produced by different versions of Cassandra.I do not think any of the nodetool methods would be affected by this, as they do not appear to touch the contents of the log files.On 17 Sep 2024, at 09:28, Štefan Miklošovič wrote:to Benedict:well ... I was not around when the decision about the usage of Chronicle Queues was made. I think that at that time it was the most obvious candidate without reinventing the wheel given the features and capabilities it had so taking something off the shelf was a natural conclusion.Josh / Jordan:not only FQL but Audit as well these are two separate things. There is also quite a "rich" ecosystem around that.1) nodetool commands likeenableauditlogenablefullquerylogdisableauditlogdisablefullqueryloggetauditloggetfullquerylogAlso, because the files it produces are binary, we need a special tooling to inspect it, it is in tools/fqltool with a bunch of classes, and there is also an AuditLogViewer for reviewing audit logs.There are MBean methods enabling nodetool commands.We have also shipped that in two major releases (4.0 and now in 5.0) so the community is quite well used to this, they have the processes set around this etc.I mention this all because it is just not so easy to replace it with something else if somebody wanted that, in any case. How do we even go around deprecating this if we are indeed going to replace that?To discuss the release aspect they have in place: I think you are right that the latest ea is as close as possible, if not the same, as what they release privately. Yes. But if we want to stick to the rule that we upgrade only to the latest ea relese before their next minor, then 1) we will be always at least one minor late2) we do not know when they make up their minds to transition to a new minor so we can upgrade to the latest ea one minor before 3) if something is broken and we need to fix it and we are on ea, then what we get to update to is the latest ea at that time which might fix the issue but it will also bring new stuff in which might open doors to instability as well. So we update to fix the bugs but we might include new ones unknowingly.Anyway, I don't think this has any silver bullet solution, we might just stick to the latest "ea" and be done with it. I do not expect this project to evolve wildly and unpredictably, it just solves "one problem", there is basically nothing new coming in.Brandon:I understand your concerns about phoning home but 1) we already resolved this by setting the respective property2) I do not think that Chronicle will mess with this once they introduce that. There is nothing to "improve" or "change" there. It is phoning home or not and it is driven by one property. If they made a change that we can not turn it off then we would really be in trouble but for now we are not and practically speaking I don't expect this would change. I know that this might sound like wishf