Re: using S3 as the Directory for Solr

dhurandar S Thu, 23 Apr 2020 18:50:32 -0700

Hi Jan,

Thank you for your reply. The reason we are looking for S3 is since the
volume is close to 10 Petabytes.
We are okay to have higher latency of say twice or thrice that of placing
data on the local disk. But we have a requirement to have long-range data
and providing Seach capability on that.  Every other storage apart from S3
turned out to be very expensive at that scale.


Basically I want to replace

-Dsolr.directoryFactory=HdfsDirectoryFactory \

 with S3 based implementation.


regards,
Rahul





On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl <jan....@cominvent.com> wrote:

> Hi,
>
> Is your data so partitioned that it makes sense to consider splitting up
> in multiple collections and make some arrangement that will keep only
> a few collections live at a time, loading index files from S3 on demand?
>
> I cannot see how an S3 directory would be able to effectively cache files
> in S3 and what units the index files would be stored as?
>
> Have you investigated EFS as an alternative? That would look like a
> normal filesystem to Solr but might be cheaper storage wise, but much
> slower.
>
> Jan
>
> > 23. apr. 2020 kl. 06:57 skrev dhurandar S <dhurandarg...@gmail.com>:
> >
> > Hi,
> >
> > I am looking to use S3 as the place to store indexes. Just how Solr uses
> > HdfsDirectory to store the index and all the other documents.
> >
> > We want to provide a search capability that is okay to be a little slow
> but
> > cheaper in terms of the cost. We have close to 2 petabytes of data on
> which
> > we want to provide the Search using Solr.
> >
> > Are there any open-source implementations around using S3 as the
> Directory
> > for Solr ??
> >
> > Any recommendations on this approach?
> >
> > regards,
> > Rahul
>
>

Re: using S3 as the Directory for Solr

Reply via email to