Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library
I feel like a group of us discussed this IRL a bit at ApacheCon in Vegas ~ 2019 maybe? Anyhoo, the tidbit sticking in my mind was someone explaining the string operations overhead in the JVM of log concatenation vs slapping binary to CQ’s off heap-and-append operation was substantial. We could hostile fork and bring the bits we use in tree (a jerk move, but they started it with this weird release model). I’d rather avoid this, but it’s an option seeing as how it’s ASFv2. On Thu, 19 Sep 2024 at 5:08 AM, Jeremiah Jordan wrote: > > When it comes to alternatives, what about logback + slf4j? It has >> appenders where we want, it is sync / async, we can code some nio appender >> too I guess, it logs it as text into a file so we do not need any special >> tooling to review that. For tailing which Chronicle also offers, I guess >> "tail -f that.log" just does the job? logback even rolls the files after >> they are big enough so it rolls the files the same way after some >> configured period / size as Chronicle does (It even compresses the logs). >> > > Yes it was considered. The whole point was to have a binary log because > serialization to/from (remember replay is part off this) text explodes the > size on disk and in memory as well as the processing time required and does > not meet the timing requirements of fqltool. > > -Jeremiah >
Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations
Is there any update on this topic? It seems that things can make a big progress if Jake Luciani can find someone who can make the FileSystemProvider code accessible. Jon Haddad 于2023年12月16日周六 05:29写道: > At a high level I really like the idea of being able to better leverage > cheaper storage especially object stores like S3. > > One important thing though - I feel pretty strongly that there's a big, > deal breaking downside. Backups, disk failure policies, snapshots and > possibly repairs would get more complicated which haven't been particularly > great in the past, and of course there's the issue of failure recovery > being only partially possible if you're looking at a durable block store > paired with an ephemeral one with some of your data not replicated to the > cold side. That introduces a failure case that's unacceptable for most > teams, which results in needing to implement potentially 2 different backup > solutions. This is operationally complex with a lot of surface area for > headaches. I think a lot of teams would probably have an issue with the > big question mark around durability and I probably would avoid it myself. > > On the other hand, I'm +1 if we approach it something slightly differently > - where _all_ the data is located on the cold storage, with the local hot > storage used as a cache. This means we can use the cold directories for > the complete dataset, simplifying backups and node replacements. > > For a little background, we had a ticket several years ago where I pointed > out it was possible to do this *today* at the operating system level as > long as you're using block devices (vs an object store) and LVM [1]. For > example, this works well with GP3 EBS w/ low IOPS provisioning + local NVMe > to get a nice balance of great read performance without going nuts on the > cost for IOPS. I also wrote about this in a little more detail in my blog > [2]. There's also the new mount point tech in AWS which pretty much does > exactly what I've suggested above [3] that's probably worth evaluating just > to get a feel for it. > > I'm not insisting we require LVM or the AWS S3 fs, since that would rule > out other cloud providers, but I am pretty confident that the entire > dataset should reside in the "cold" side of things for the practical and > technical reasons I listed above. I don't think it massively changes the > proposal, and should simplify things for everyone. > > Jon > > [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/ > [2] https://issues.apache.org/jira/browse/CASSANDRA-8460 > [3] > https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/ > > > On Thu, Dec 14, 2023 at 1:56 AM Claude Warren wrote: > >> Is there still interest in this? Can we get some points down on >> electrons so that we all understand the issues? >> >> While it is fairly simple to redirect the read/write to something other >> than the local system for a single node this will not solve the problem for >> tiered storage. >> >> Tiered storage will require that on read/write the primary key be >> assessed and determine if the read/write should be redirected. My >> reasoning for this statement is that in a cluster with a replication factor >> greater than 1 the node will store data for the keys that would be >> allocated to it in a cluster with a replication factor = 1, as well as some >> keys from nodes earlier in the ring. >> >> Even if we can get the primary keys for all the data we want to write to >> "cold storage" to map to a single node a replication factor > 1 means that >> data will also be placed in "normal storage" on subsequent nodes. >> >> To overcome this, we have to explore ways to route data to different >> storage based on the keys and that different storage may have to be >> available on _all_ the nodes. >> >> Have any of the partial solutions mentioned in this email chain (or >> others) solved this problem? >> >> Claude >> >
Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library
> When it comes to alternatives, what about logback + slf4j? It has > appenders where we want, it is sync / async, we can code some nio appender > too I guess, it logs it as text into a file so we do not need any special > tooling to review that. For tailing which Chronicle also offers, I guess > "tail -f that.log" just does the job? logback even rolls the files after > they are big enough so it rolls the files the same way after some > configured period / size as Chronicle does (It even compresses the logs). > Yes it was considered. The whole point was to have a binary log because serialization to/from (remember replay is part off this) text explodes the size on disk and in memory as well as the processing time required and does not meet the timing requirements of fqltool. -Jeremiah
[ANNOUNCE] Apache Cassandra 4.0.14, 4.1.7, 5.0.1 test artefacts available
The test builds of Cassandra 4.0.14, 4.1.7 and 5.0.1, are available. A vote of this test build will be initiated within the next couple of days. == 4.0.14 == sha1: 7bf67349579411521bcdee4febd209cff63179a6 Git: https://github.com/apache/cassandra/tree/4.0.14-tentative Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1345/org/apache/cassandra/cassandra-all/4.0.14/ The Source and Build Artifacts, and the Debian and RPM packages and repositories, are available here: https://dist.apache.org/repos/dist/dev/cassandra/4.0.14/ [1]: CHANGES.txt: https://github.com/apache/cassandra/blob/4.0.14-tentative/CHANGES.txt [2]: NEWS.txt: https://github.com/apache/cassandra/blob/4.0.14-tentative/NEWS.txt == 4.1.7 == sha1: ca494526025a480bc8530ed3ae472ce8c9cbaf7a Git: https://github.com/apache/cassandra/tree/4.1.7-tentative Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1347/org/apache/cassandra/cassandra-all/4.1.7/ The Source and Build Artifacts, and the Debian and RPM packages and repositories, are available here: https://dist.apache.org/repos/dist/dev/cassandra/4.1.7/ [1]: CHANGES.txt: https://github.com/apache/cassandra/blob/4.1.7-tentative/CHANGES.txt [2]: NEWS.txt: https://github.com/apache/cassandra/blob/4.1.7-tentative/NEWS.txt == 5.0.1 == sha1: c206e4509003ac4cd99147d821bd4b5d23bdf5e8 Git: https://github.com/apache/cassandra/tree/5.0.1-tentative Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1348/org/apache/cassandra/cassandra-all/5.0.1/ The Source and Build Artifacts, and the Debian and RPM packages and repositories, are available here: https://dist.apache.org/repos/dist/dev/cassandra/5.0.1/ [1]: CHANGES.txt: https://github.com/apache/cassandra/blob/5.0.1-tentative/CHANGES.txt [2]: NEWS.txt: https://github.com/apache/cassandra/blob/5.0.1-tentative/NEWS.txt
Re: 【DISCUSS】The configuration of Commitlog archiving
Do you have any new updates on this DISCUSS ? - The reason this pattern is popular is it allows extension of functionality ahead of the database. Some people copy to a NAS/SAN. Some people copy to S3. Some people copy to their own object storage that isn’t s3 compatible. “Compress and move” is super limiting, because “move” varies remarkably between environments. Yes, it is indeed very flexible to use this way, but would it be more appropriate to decouple the file archiving to heterogeneous storage and leave it to other systems to handle it specifically? And we only do compression and copying (file linking like sstable incremental backup)? Štefan Miklošovič 于2024年9月5日周四 04:18写道: > > On Wed, Sep 4, 2024 at 8:34 PM Jon Haddad wrote: > >> I thought about this a bit over the last few days, and there's actually >> quite a few problems present that would need to be addressed. >> >> *Insecure JMX* >> >> First off - if someone has access to JMX, the entire system is already >> compromised. A bad actor can mess with the cluster topology, truncate >> tables, and do a ton of other disruptive stuff. But if we're going to go >> down this path I think we should apply your logic consistently to avoid >> creating a "solution" that has the same "problem" as we do now. I use >> quotes because I'm not entirely convinced the root cause of the problem is >> enabling some shell access, but I'll entertain it for the sake of the >> discussion. >> >> *Dynamic Configuration and Shell Scripts* >> >> Let's pretend that somehow an open JMX isn't already a *massive* security >> flaw by itself. Once an attacker has control of a system, the next phase >> of the attack relies on them dynamically changing the configuration to >> point to a different shell script, or to execute arbitrary shell scripts. >> > I agree with the general idea that we don't want this - so in my mind the >> necessary solution here is to disable the ability to change the commit log >> archiving behavior at runtime. >> >> The idea that commit log archiving (and many other config settings) would >> be dynamically configurable is a massive security flaw that should be >> disallowed. If you want to take this a step further and claim there's a >> flaw with shell scripts in general, I'll even entertain that for a minute, >> but we need to examine if the proposed solution of moving code to Java >> actually solves the problem. >> >> *Dynamic Configuration and Java Code* >> >> Let's say we've removed the ability to use shell scripts, and we've >> gotten people to rewrite their shell code with java code, but we've left >> the dynamic configuration in. Going back to my original email, I mentioned >> copying commit logs off the node and into an object store. If someone is >> able to change the parameter dynamically at runtime, they could just as >> easily point to a public S3 bucket and commit logs would be archived there >> which is just as bad as the shell version. So if we are to convert this >> functionality to Java, we should also be making best practice >> recommendations on what users should and should not do. >> > > I think what you meant here is that if we allowed people to provide a > pluggable way how stuff is copied over and they would code it up, put that > JAR on the class path, Cassandra (re)started etc, then someone might > reconfigure this custom solution in runtime? Yeah, we do not want this. We > can make it pluggable, but not reconfigurable. To have it pluggable and not > reconfigurable, then to replace it with something else, an attacker would > basically need to restart Cassandra with a rogue JAR on the class path. In > order to do that, I think that at this point it would be beyond any > salvation and the system is completely compromised anyway. > > >> >> >> *Apply All Operational Best Practices* >> >> There's been a variety of examples of how a user can further compromise a >> machine once they have JMX, working in tandem with shell scripts, but I >> hope at this point you can see that the issue is fundamentally more complex >> than simply disallowing shell scripts. The issue is present in the Java >> examples as well, and is strongly tied to the issue of dynamic config. If >> we're to design this the "right" way, I think we'd want these properties: >> >> * Commit log archiving should only have the ability to compress and move >> files to a staging location >> * Once the files are moved to the staging location, the file should be >> moved somewhere else by a script NOT run as the C* user. >> > * The commit log archive configuration should not be dynamically >> updatable, nor should any config affecting directories >> > > This would essentially copy the logic we have for snapshots as Jordan > mentioned. I do not mind having it like that. It is a good question for > what exactly we need to have it reconfigurable. Why is it like that? People > do not want to restart a whole cluster consisting of 100 nodes when the > destination of the arch