If all you need is better availability, I would start by trying out an additional replica of each shard on a different box, so each box would be serving the data for 2 shards and each shard would be available on 2 boxes.
Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Mon, Sep 15, 2014 at 1:29 PM, Amey - codeinventory < ameyjad...@codeinventory.com> wrote: > well, i have 8 m1.large ec2 having 2 core 7gb ram and 1tb ebs attached to > each server for index. > > in my case i dont expect index to be store in ram neither a quick reply as > its not a real time application, i just want fault tolerance in application > and availability of full data. > > > Is it good to use HDFS over normal solr cloud? > > Best, > Amey > > --- Original Message --- > > From: "Michael Della Bitta" <michael.della.bi...@appinions.com> > Sent: September 15, 2014 9:26 PM > To: solr-user@lucene.apache.org > Subject: Re: Moving to HDFS, How to merge indices from 8 servers ? > > There's not much about Solr Cloud or HDFS indexes that suggests you should > only have one logical shard. If your goal is better uptime with a sharded > index, you should add more replicas. > > If your collection is small enough that one machine can serve one query > with acceptable performance, but you want to scale to many queries, then > just adding mirrors of a single-sharded collection is fine. But that's a > big "if." > > Switching to HDFS is an option if you have enough RAM for your whole > collection, and have a lot of existing storage devoted to HDFS, or if you > want to batch create indexes. It's not really aimed at preserving uptime as > far as I know. > > Michael Della Bitta > > Applications Developer > > o: +1 646 532 3062 > > appinions inc. > > “The Science of Influence Marketing” > > 18 East 41st Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions > < > https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts > > > w: appinions.com <http://www.appinions.com/> > > On Mon, Sep 15, 2014 at 11:23 AM, Amey Jadiye < > ameyjad...@codeinventory.com> > wrote: > > > Thanks for reply Erik, > > I think i have some misconfusion about how SOLR works with HDFS, and > > solution i am thinking could be reorganised by user community :) > > Here is the actual solution/situation which is implemented by me > > *Usecase* : I need a google like search engine which should be work in > > distributed and fault tolerant mode, we are collecting the health related > > URLs from a third party system in large amount, approx 1Million/hour. we > > want to build an inventory which contains all of there detail. now i am > > fetching that URL data breaking it in H1, P, Div like tags with help of > > Jsoup lib and putting in Solr as a documents with different boost to > > different fields. > > Now after the putting this data, i have a custom program with which we > > categorise all the data Example. All the cancer related pages, i am > > querying the SOLR and fetching all URL related to cancer with CursorMark > > and putting in a file for further use of our system. > > *Old Solution* : For this i have build the 8 SOLR servers with 3 > > zookeepers on the individual AWS Ec2 instances with one collection:8 > shards > > problem with this solution is whenever any instance go down i am loosing > > that data for a moment. link of current solution > > http://postimg.org/image/luli3ybtj/ > > *New _OR_ could be faulty solution* : I am thinking that if i use HDFS > > which is virtually only one file system is better so if my server go down > > that data is available through another server, below is steps i am > thinking > > to do. > > 1 > I will merge all the 8 server indices somewhere in to one.2 > Make > > setting for HDFS on same 8 servers.3 > Put the merged index folder in > HDFS > > so it will be distributed in 8 servers physically it self.4 > Restart 8 > > servers pointing to HDFS on each instance.5 > and now i am ready to go > for > > putting data on 8 servers and fetching through any one of SOLR , if that > is > > down choose another so it will be guaranteed to get all the data. > > So is this solution sounds good, OR you guys suggest me another better > > solution ? > > Regards,Amey > > > > > > > Date: Thu, 11 Sep 2014 14:41:48 -0700 > > > Subject: Re: Moving to HDFS, How to merge indices from 8 servers ? > > > From: erickerick...@gmail.com > > > To: solr-user@lucene.apache.org > > > > > > Um, I really think this is pretty likely to not be a great solution. > > > When you say "merge indexes", I'm thinking you want to go from 8 > > > shards to 1 shard. Now, this can be done with the "merge indexes" core > > > admin API, see: > > > https://wiki.apache.org/solr/MergingSolrIndexes > > > > > > BUT. > > > 1> This will break all things SolrCloud-ish assuming you created your > > > 8 shards under SolrCloud. > > > 2> Solr is usually limited by memory, so trying to fit enough of your > > > single huge index into memory may be problematical. > > > > > > This feels like an XY problem, _why_ are you asking about this? What > > > is the use-case you want to handle by this? > > > > > > Best, > > > Erick > > > > > > On Thu, Sep 11, 2014 at 7:44 AM, Amey Jadiye > > > <ameyjad...@codeinventory.com> wrote: > > > > FYI, I searched the google for this problem but didn't find any > > satisfactory answer.Here is the current situation : I have the 8 shards > in > > my solr cloud backed up with 3 zookeeper all are setup on AWS EC2 > > instances, all 8 are leader with no replicas.I have only 1 collection say > > collection1 divided in 8 shards, i have configured the index and tlog > > folder on each server pointing into 1TB EBS disk attached to each > servers, > > all 8 servers are having around 100GB for index folder each. so total > index > > files i have is ~800Gb.Now, i want to move all the data to HDFS, so I am > > going to setup the HDFS on all 8 serversMerge all the indexes from 8 > > serversPut in HDFS.Stop and Start my all solr servers on HDFS to access > > that common index data with setting below cp parameter and few > > more.-Dsolr.directoryFactory=HdfsDirectoryFactory > > -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://host:port/path > > -Dsolr.updatelog=hdfs://host:port/path -jarNow could you tell me is this > > correct approach? if yes how can i merge all indices from 8 server > > ?Regards,Amey > > > > >