Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-22 Thread Benedict
I don’t think there’s any requirement to run general testing with every storage variant, except perhaps pre-release. The idea is to look for regressions in the modified areas of the codebase, and if the storage layer hasn’t been changed it doesn’t make sense to confuse or slow down testing IMO.

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-22 Thread Josh McKenzie
Strong +1 for the proposal here. > One of the questions that we want to ask is whether anyone objects to > maintaining full compatibility with existing files created by DataStax > Enterprise. No concerns here. So long as it's clear in the implementation what it is and why it's there I don't see

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-22 Thread Jacek Lewandowski
+1 for the proposal ! btw. regarding tests - perhaps we will have to let Python DTests run with either new or old format thanks - - -- --- - - Jacek Lewandowski On Mon, Nov 21, 2022 at 3:06 PM Benedict wrote: > Yes of course, this was absolutely just a query and not a

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-21 Thread Benedict
Yes of course, this was absolutely just a query and not a precondition for this work. It stands on its own on my view, and I’m already ready to +1 the proposal. > On 21 Nov 2022, at 13:55, Branimir Lambov wrote: > >  > I see. This does make a lot of sense for full row indexing, and also if one

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-21 Thread Branimir Lambov
I see. This does make a lot of sense for full row indexing, and also if one can specify sub-kb granularity (at the current default we just won't have an index in these cases). How does opening a ticket to do these two* after the current code is committed sound? * embedded index for sub-X-byte part

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-21 Thread Benedict
Buffering on write up to at most one page seems fine? Once you are past a single page it’s fine to write either to the end of the partition or to a separate file, there’s nothing much to be gained, but esp. for small partitions there’s likely significant value in prepending it? It might be pref

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-21 Thread Branimir Lambov
There is no intention to introduce any new versions of the format specifically for DSE. If there are any further changes to the format, they will be OSS-first. In other words this support only extends to preexisting versions of the format. Inline row index in the data file is not something we have

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-21 Thread Benedict
Personally very pleased to see this proposal, and I’m not opposed to easing your migration by maintaining some light support for internal file versions - though would prefer the support have some version limit where it can be excised (maybe for one minor version bump?) One implementation questi

[DISCUSS] CEP-25: Trie-indexed SSTable format

2022-11-21 Thread Branimir Lambov
Hi everyone, We would like to put CEP-25 for discussion. https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format The proposal describes DSE's Big Trie-indexed SSTable format, which replaces the primary index with on-disk tries to improve lookup performance and