[scottchu] What kind of configuration to use for this size of news data?

scott.chu Tue, 10 May 2016 20:28:51 -0700

Fix some typos, add some words and resend same question => 

I want to build a Solr engine for over 60-year news articles. My requests are 
(I use Solr 5.4.1):
 
1> Currently over 10M no. of docs.
2> Currently over 60GB total data size.
3> The no. of docs and data size will keep growing at the rate of 1000 no. of 
docs(or 8MB size) per day.
4> There are totally 5-6 different newspaper types.
 
My questions are:
1> Is it wokable enough just to use master-slave model? Or should I turn to 
SolrCloud? (I ask this due to our system management group never manage a 
distributed system before and they also have no knowedge of Zookeeper, shards, 
etc. Also they don't know how to backup/restore distributed data.)
2> Say if I choose Solrcloud anyway. I wish to keep one shard owning one 
specific year of data. Can it be done? What configuration should I do? (AFAIK, 
SolrCloud distributes data based on some intrinsic routing algorithm.)
3> If I wish to create another Solr engine with one or two particular paper 
types. Is it possible to copy their index data directly from the big central 
Solr engine? Or I have to rebuild index from raw articles data? (Our business 
has this possibility of needs.)
 
I'd like to hear and use some well suggestion and experiences.
 
Thanks in advance and best regards.


Scott Chu @ 2016/5/11  11:26 GMT+8

[scottchu] What kind of configuration to use for this size of news data?

Reply via email to