Thanks for reply Erik, I think i have some misconfusion about how SOLR works with HDFS, and solution i am thinking could be reorganised by user community :) Here is the actual solution/situation which is implemented by me *Usecase* : I need a google like search engine which should be work in distributed and fault tolerant mode, we are collecting the health related URLs from a third party system in large amount, approx 1Million/hour. we want to build an inventory which contains all of there detail. now i am fetching that URL data breaking it in H1, P, Div like tags with help of Jsoup lib and putting in Solr as a documents with different boost to different fields. Now after the putting this data, i have a custom program with which we categorise all the data Example. All the cancer related pages, i am querying the SOLR and fetching all URL related to cancer with CursorMark and putting in a file for further use of our system. *Old Solution* : For this i have build the 8 SOLR servers with 3 zookeepers on the individual AWS Ec2 instances with one collection:8 shards problem with this solution is whenever any instance go down i am loosing that data for a moment. link of current solution http://postimg.org/image/luli3ybtj/ *New _OR_ could be faulty solution* : I am thinking that if i use HDFS which is virtually only one file system is better so if my server go down that data is available through another server, below is steps i am thinking to do. 1 > I will merge all the 8 server indices somewhere in to one.2 > Make setting for HDFS on same 8 servers.3 > Put the merged index folder in HDFS so it will be distributed in 8 servers physically it self.4 > Restart 8 servers pointing to HDFS on each instance.5 > and now i am ready to go for putting data on 8 servers and fetching through any one of SOLR , if that is down choose another so it will be guaranteed to get all the data. So is this solution sounds good, OR you guys suggest me another better solution ? Regards,Amey
> Date: Thu, 11 Sep 2014 14:41:48 -0700 > Subject: Re: Moving to HDFS, How to merge indices from 8 servers ? > From: erickerick...@gmail.com > To: solr-user@lucene.apache.org > > Um, I really think this is pretty likely to not be a great solution. > When you say "merge indexes", I'm thinking you want to go from 8 > shards to 1 shard. Now, this can be done with the "merge indexes" core > admin API, see: > https://wiki.apache.org/solr/MergingSolrIndexes > > BUT. > 1> This will break all things SolrCloud-ish assuming you created your > 8 shards under SolrCloud. > 2> Solr is usually limited by memory, so trying to fit enough of your > single huge index into memory may be problematical. > > This feels like an XY problem, _why_ are you asking about this? What > is the use-case you want to handle by this? > > Best, > Erick > > On Thu, Sep 11, 2014 at 7:44 AM, Amey Jadiye > <ameyjad...@codeinventory.com> wrote: > > FYI, I searched the google for this problem but didn't find any > > satisfactory answer.Here is the current situation : I have the 8 shards in > > my solr cloud backed up with 3 zookeeper all are setup on AWS EC2 > > instances, all 8 are leader with no replicas.I have only 1 collection say > > collection1 divided in 8 shards, i have configured the index and tlog > > folder on each server pointing into 1TB EBS disk attached to each servers, > > all 8 servers are having around 100GB for index folder each. so total index > > files i have is ~800Gb.Now, i want to move all the data to HDFS, so I am > > going to setup the HDFS on all 8 serversMerge all the indexes from 8 > > serversPut in HDFS.Stop and Start my all solr servers on HDFS to access > > that common index data with setting below cp parameter and few > > more.-Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs > > -Dsolr.data.dir=hdfs://host:port/path > > -Dsolr.updatelog=hdfs://host:port/path -jarNow could you tell me is this > > correct approach? if yes how can i merge all indices from 8 server > > ?Regards,Amey