On Tue, Mar 28, 2023 at 7:30 AM Jeremiah D Jordan <jeremiah.jor...@gmail.com> wrote:
> - Resources isolation. Having the said service running within the same JVM > may negatively impact Cassandra storage's performance. It could be more > beneficial to have them in Sidecar, which offers strong resource isolation > guarantees. > > > How does having this in a side car change the impact on “storage > performance”? The side car reading sstables will have the same impact on > storage IO as the main process reading sstables. > This is true. > Given the sidecar is running on the same node as the main C* process, the > only real resource isolation you have is in heap/GC? CPU/Memory/IO are all > still shared between the main C* process and the side car, and coordinating > those across processes is harder than coordinating them within a single > process. For example if we wanted to have the compaction throughput, > streaming throughput, and analytics read throughput all tied back to a > single disk IO cap, that is harder with an external process. > Relatively trivial, for CPU and memory, to run them in different containers/cgroups/etc, so you can put an exact cpu/memory limit on the sidecar. That's different from a jmx rate limiter/throttle, but (arguably) more precise, because it actually limits the underlying physical resource instead of a proxy for it in a config setting. > > - Complexity. Considering the existence of the Sidecar project, it would > be less complex to avoid adding another (http?) service in Cassandra. > > > Not sure that is really very complex, running an http service is a pretty > easy? We already have netty in use to instantiate one from. > I worry more about the complexity of having the matching schema for a set > of sstables being read. The complexity of new sstable versions/formats > being introduced. The complexity of having up to date data from memtables > being considered by this API without having to flush before every query of > it. The complexity of dealing with the new memtable API introduced in > CEP-11. The complexity of coordinating compaction/streaming adding and > removing files with these APIs reading them. There are a lot of edge cases > to consider for this external access to sstables that the main process > considers itself the “owner” of. > > All of this is not to say that I think separating things out into other > processes/services is bad. But I think we need to be very careful with how > we do it, or end users will end up running into all the sharp edges and the > feature will fail. > > -Jeremiah > > On Mar 24, 2023, at 8:15 PM, Yifan Cai <yc25c...@gmail.com> wrote: > > Hi Jeremiah, > > There are good reasons to not have these inside Cassandra. Consider the > following. > - Resources isolation. Having the said service running within the same JVM > may negatively impact Cassandra storage's performance. It could be more > beneficial to have them in Sidecar, which offers strong resource isolation > guarantees. > - Availability. If the Cassandra cluster is being bounced, using sidecar > would not affect the SBR/SBW functionality, e.g. SBR can still read > SSTables via sidecar endpoints. > - Compatibility. Sidecar provides stable REST-based APIs, such as > uploading SSTables endpoint, which would remain compatible with different > versions of Cassandra. The current implementation supports versions 3.0 and > 4.0. > - Complexity. Considering the existence of the Sidecar project, it would > be less complex to avoid adding another (http?) service in Cassandra. > - Release velocity. Sidecar, as an independent project, can have a quicker > release cycle from Cassandra. > - The features in sidecar are mostly implemented based on various existing > tools/APIs exposed from Cassandra, e.g. ring, commit sstable, snapshot, etc. > > Regarding authentication and authorization > - We will add it as a follow-on CEP in Sidecar, but we don't want to hold > up this CEP. It would be a feature that benefits all Sidecar endpoints. > > - Yifan > > On Fri, Mar 24, 2023 at 2:43 PM Doug Rohrer <droh...@apple.com> wrote: > >> I agree that the analytics library will need to support vnodes. To be >> clear, there’s nothing preventing the solution from working with vnodes >> right now, and no assumptions about a 1:1 topology between a token and a >> node. However, we don’t, today, have the ability to test vnode support >> end-to-end. We are working towards that, however, and should be able to >> remove the caveat from the released analytics library once we can properly >> test vnode support. >> If it helps, I can update the CEP to say something more like “Caveat: >> Currently untested with vnodes - work is ongoing to remove this limitation” >> if that helps? >> >> Doug >> >> > On Mar 24, 2023, at 11:43 AM, Brandon Williams <dri...@gmail.com> >> wrote: >> > >> > On Fri, Mar 24, 2023 at 10:39 AM Jeremiah D Jordan >> > <jeremiah.jor...@gmail.com> wrote: >> >> >> >> I have concerns with the majority of this being in the sidecar and not >> in the database itself. I think it would make sense for the server side of >> this to be a new service exposed by the database, not in the sidecar. That >> way it can be able to properly integrate with the authentication and >> authorization apis, and to make it a first class citizen in terms of having >> unit/integration tests in the main DB ensuring no one breaks it. >> > >> > I don't think this can/should happen until it supports the database's >> > default configuration with vnodes. >> >> >