Re: SolrCloud separating compute from storage

Mikhail Khludnev Wed, 12 Jul 2023 09:14:16 -0700

keeping from scratch speculation mode on..

On Wed, Jul 12, 2023 at 3:45 PM Ilan Ginzburg <ilans...@gmail.com> wrote:


> I think mandating the use of Kafka to switch to this mode of separating
> compute and storage would make adoption harder. One would also need a
> deployment of Kafka that is resilient to an AZ going down. We get this "for
> free" by using S3/GCS or similar.
>
It sounds like a reasonable requirement for vendor agnostic abstraction
like we have for S3. In this case managed offerings for kafta (AWS MSK) and
analogs (GCP Pub\Sub) simplify adoption.

Moreover, the transaction log will be per shard but we can't afford a Kafka
> topic per shard (in our use case we have thousands of shards on the
> cluster). Addressing all this will likely complexify a design that will
> lose its initial beauty and appeal.
>
Good point. But can't they be sharded transparently? If we run N
IndexeWriters, and every of them can consume 1 (but optionally..N) shards
of a topic?
Presumably, we can take  Solr Cross DC from solr-sandbox and turn
transaction log off.


>
> I hope to be able to present this topic using 5 minutes during the
> upcoming community virtual
> meetup
> <https://cwiki.apache.org/confluence/display/SOLR/2023-07-19+Meeting+notes
> >
> (July
> 19th).
>

Looking forward to it!


>
> Ilan
>
>
> On Wed, Jul 12, 2023 at 10:54 AM Mikhail Khludnev <m...@apache.org> wrote:
>
> > Hello Ilan,
> > Late comment, though.
> >
> > On Fri, Apr 28, 2023 at 8:33 PM Ilan Ginzburg <ilans...@gmail.com>
> wrote:
> >
> > > ...
> > > We're considering improving this approach by making the transaction
> log a
> > > shard level abstraction (rather than a replica/node abstraction), and
> > store
> > > it in S3 as well with a transaction log per shard, not per replica.
> > > This would allow indexing to not commit on every batch, speed up
> /update
> > > requests, push the constructed segments asynchronously to S3, guarantee
> > > data durability while still allowing nodes to be stateless (so can be
> > shut
> > > down at any time in any number without data loss and without having to
> > > restart these nodes to recover data only they can access).
> > > ...
> > > Thanks,
> > > Ilan
> > >
> >
> > When discussing these (pretty cool) architectures I'm missing the point
> of
> > implementing transaction log in Solr codebase.
> > I think Kafka is the best fit for such a pre-indexer buffer. WYDT?
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: SolrCloud separating compute from storage

Reply via email to