Thanks for reply Erik,
I think i have some misconfusion about how SOLR works with HDFS, and solution i 
am thinking could be reorganised  by user community :)
Here is the actual solution/situation which is implemented by me
*Usecase* : I need a google like search engine which should be work in 
distributed and fault tolerant mode, we are collecting the health related  URLs 
from a third party system in large amount, approx 1Million/hour. we want to 
build an inventory which contains all of there detail. now i am fetching that 
URL data breaking it in H1, P, Div like tags with help of Jsoup lib and putting 
in Solr as a documents with different boost to different fields.
Now after the putting this data, i have a custom program with which we 
categorise all the data Example. All the cancer related pages, i am querying 
the SOLR and fetching all URL related to cancer with CursorMark and putting in 
a file for further use of our system.
*Old Solution* : For this i have build the 8 SOLR servers with 3 zookeepers on 
the individual AWS Ec2 instances with one collection:8 shards problem with this 
solution is whenever any instance go down i am loosing that data for a moment. 
link of current solution http://postimg.org/image/luli3ybtj/ 
*New _OR_ could be faulty solution* : I am thinking that if i use HDFS which is 
virtually only one file system is better so if my server go down that data is 
available through another server, below is steps i am thinking to do.
1 > I will merge all the 8 server  indices somewhere in to one.2 > Make setting 
for HDFS on same 8 servers.3 > Put the merged index folder in HDFS so it will 
be distributed in 8 servers physically it self.4 > Restart 8 servers pointing 
to HDFS on each instance.5 > and now i am ready to go for putting data on 8 
servers and fetching through any one of SOLR , if that is down choose another 
so it will be guaranteed to get all the data. 
So is this solution sounds good, OR you guys suggest me another better solution 
?
Regards,Amey


> Date: Thu, 11 Sep 2014 14:41:48 -0700
> Subject: Re: Moving to HDFS, How to merge indices from 8 servers ?‏‏
> From: erickerick...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Um, I really think this is pretty likely to not be a great solution.
> When you say "merge indexes", I'm thinking you want to go from 8
> shards to 1 shard. Now, this can be done with the "merge indexes" core
> admin API, see:
> https://wiki.apache.org/solr/MergingSolrIndexes
> 
> BUT.
> 1>  This will break all things SolrCloud-ish assuming you created your
> 8 shards under SolrCloud.
> 2> Solr is usually limited by memory, so trying to fit enough of your
> single huge index into memory may be problematical.
> 
> This feels like an XY problem, _why_ are you asking about this? What
> is the use-case you want to handle by this?
> 
> Best,
> Erick
> 
> On Thu, Sep 11, 2014 at 7:44 AM, Amey Jadiye
> <ameyjad...@codeinventory.com> wrote:
> > FYI, I searched the google for this problem but didn't find any 
> > satisfactory answer.Here is the current situation : I have the 8 shards in 
> > my solr cloud backed up with 3 zookeeper all are setup on AWS EC2 
> > instances, all 8 are leader with no replicas.I have only 1 collection say 
> > collection1 divided in 8 shards, i have configured the index and tlog 
> > folder on each server pointing into 1TB EBS disk attached to each servers, 
> > all 8 servers are having around 100GB for index folder each. so total index 
> > files i have is ~800Gb.Now, i want to move all the data to HDFS, so I am 
> > going to setup the HDFS on all 8 serversMerge all the indexes from 8 
> > serversPut in HDFS.Stop  and Start my all solr servers on HDFS to access 
> > that common index data with setting  below cp parameter and few 
> > more.-Dsolr.directoryFactory=HdfsDirectoryFactory     -Dsolr.lock.type=hdfs 
> >     -Dsolr.data.dir=hdfs://host:port/path     
> > -Dsolr.updatelog=hdfs://host:port/path -jarNow could you tell me is this 
> > correct approach? if yes how can i merge all indices from 8 server 
> > ?Regards,Amey
                                          

Reply via email to