Re: Strategy for handling large (and growing) index: horizontal partitioning?

Otis Gospodnetic Thu, 28 Feb 2008 22:55:34 -0800

James,

I can't comment more on the SN's arch choices.


Here is the story about your questions
- 1 Solr instance can hold 1+ indices, either via JNDI (see Wiki) or via the 
new multi-core support which works, but is still being hacked on
- See SOLR-303 in JIRA for distributed search.  Yonik committed it just the 
other day, so now that's in nightly builds if you want to try it.  There are 2 
Wiki pages about that, too, see Recent changes log on the Wiki to quickly find 
them.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: James Brady <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, February 29, 2008 1:11:07 AM
> Subject: Re: Strategy for handling large (and growing) index: horizontal 
> partitioning?
> 
> Hi Otis,
> Thanks for your comments -- I didn't realise the wiki is open to  
> editing; my apologies. I've put in a few words to try and clear  
> things up a bit.
> 
> So determining n will probably be a best guess followed by trial and  
> error, that's fine. I'm still not clear about whether single Solr  
> servers can operate across several indices, however.. can anyone give  
> me some pointers here?
> An alternative would be to have 1 index per instance, and n instances  
> per server, where n is small. This might actually be a practical  
> solution -- I'm spending ~20% of my time committing, so I should  
> probably only have 3 or 4 indices in total per server to avoid two  
> committing at the same time.
> 
> Your mention of The Large Social Network was interesting! A social  
> network's data is by definition pretty poorly partitioned by user id,  
> so unless they've done something extremely clever like co-locating  
> social cliques in the same indices, I would have though it would be a  
> sub-optimal architecture. If me and my friends are scattered around  
> different indices, each search would have to be federated massively.
> 
> James
> 
> 
> On 28 Feb 2008, at 20:49, Otis Gospodnetic wrote:
> 
> > James,
> >
> > Regarding your questions about n users per index - this is a fine  
> > approach.  The largest Social Network that you know of uses the  
> > same approach for various things, including full-text indices (not  
> > Solr, but close).  You'd have to maintain user->shard/index mapping  
> > somewhere, of course.  What should the n be, you ask?  Look at the  
> > overall index size, I'd say, against server capabilities (RAM,  
> > disk, CPU), increase n up to a point where you're maximizing your  
> > hardware at some target query rate.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> > ----- Original Message ----
> >> From: James Brady 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, February 27, 2008 10:08:02 PM
> >> Subject: Strategy for handling large (and growing) index:  
> >> horizontal partitioning?
> >>
> >> Hi all,
> >> Our current setup is a master and slave pair on a single machine,
> >> with an index size of ~50GB.
> >>
> >> Query and update times are still respectable, but commits are taking
> >> ~20% of time on the master, while our daily index optimise can up to
> >> 4 hours...
> >> Here's the most relevant part of solrconfig.xml:
> >>      true
> >>      10
> >>      1000
> >>      10000
> >>      10000
> >>
> >> I've given both master and slave 2.5GB of RAM.
> >>
> >> Does an index optimise read and re-write the whole thing? If so,
> >> taking about 4 hours is pretty good! However, the documentation here:
> >> http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten
> >> +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b
> >> states "Optimizations can take nearly ten minutes to run..." which
> >> leads me to think that we've grossly misconfigured something...
> >>
> >> Firstly, we would obviously love any way to reduce this optimise time
> >> - I have yet to experiment extensively with the settings above, and
> >> optimise frequency, but some general guidance would be great.
> >>
> >> Secondly, this index size is increasing monotonously over time and as
> >> we acquire new users. We need to take action to ensure we can scale
> >> in the future. The approach we're favouring at the moment is
> >> horizontal partitioning of indices by user id as our data suits this
> >> scheme well. A given index would hold the indexed data for n users,
> >> where n would probably be between 1 and 100 users, and we will have
> >> multiple indices per search server.
> >>
> >> Running server per index is impractical, especially for a small n, so
> >> is a sinlge Solr instance capable of managing multiple searchers and
> >> writers in this way? Following on from that, does anyone know of
> >> limiting factors in Solr or Lucene that would influence our decision
> >> on the value of n - the number of users per index?
> >>
> >> Thanks!
> >> James
> >>
> >>
> >>
> >>
> >
> >
> 
>

Re: Strategy for handling large (and growing) index: horizontal partitioning?

Reply via email to