Re: Strategy for handling large (and growing) index: horizontal partitioning?

Otis Gospodnetic Thu, 28 Feb 2008 20:46:02 -0800

Hola James,

I think what Wunder was suggesting was really a copy (time cp -a oldIndex 
newIndex).


I'm not sure why you have both the master and the slave on the same box.... :)

As for the 10 minute Wiki thing - use the Wiki, please edit it, anyone can get 
an account and help with the Wiki.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: James Brady <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, February 28, 2008 12:59:50 PM
> Subject: Re: Strategy for handling large (and growing) index: horizontal 
> partitioning?
> 
> Hi, yes a post-optimise copy takes 45 minutes at present. Disk IO is  
> definitely the bottleneck, you're right -- iostat was showing 100%  
> utilisation for the 5 hours it took to optimise yesterday...
> 
> The master and slave are on the same disk, and it's definitely on my  
> list to fix that, but the searcher is so lightly loaded compared to  
> the indexer that I don't think it will win us too much.
> 
> As there has been another optimise time question on the list today  
> could I request that the "10 minute" claim is taken of the  
> CollectionDistribution wiki page? It's extremely misleading for  
> newcomers who don't necessarily realise an optimise entails reading  
> and writing the whole index, and that optimise time is going to be at  
> least O(n)
> 
> James
> 
> 
> On 28 Feb 2008, at 09:07, Walter Underwood wrote:
> 
> > Have you timed how long it takes to copy the index files? Optimizing
> > can never be faster than that, since it must read every byte and write
> > a whole new set. Disc speed may be your bottleneck.
> >
> > You could also look at disc access rates in a monitoring tool.
> >
> > Is there read contention between the master and slave for the same  
> > disc?
> >
> > wunder
> >
> > On 2/27/08 7:08 PM, "James Brady"  wrote:
> >
> >> Hi all,
> >> Our current setup is a master and slave pair on a single machine,
> >> with an index size of ~50GB.
> >>
> >> Query and update times are still respectable, but commits are taking
> >> ~20% of time on the master, while our daily index optimise can up to
> >> 4 hours...
> >> Here's the most relevant part of solrconfig.xml:
> >>      true
> >>      10
> >>      1000
> >>      10000
> >>      10000
> >>
> >> I've given both master and slave 2.5GB of RAM.
> >>
> >> Does an index optimise read and re-write the whole thing? If so,
> >> taking about 4 hours is pretty good! However, the documentation here:
> >> http://wiki.apache.org/solr/CollectionDistribution?highlight=%28ten
> >> +minutes%29#head-cf174eea2524ae45171a8486a13eea8b6f511f8b
> >> states "Optimizations can take nearly ten minutes to run..." which
> >> leads me to think that we've grossly misconfigured something...
> >>
> >> Firstly, we would obviously love any way to reduce this optimise time
> >> - I have yet to experiment extensively with the settings above, and
> >> optimise frequency, but some general guidance would be great.
> >>
> >> Secondly, this index size is increasing monotonously over time and as
> >> we acquire new users. We need to take action to ensure we can scale
> >> in the future. The approach we're favouring at the moment is
> >> horizontal partitioning of indices by user id as our data suits this
> >> scheme well. A given index would hold the indexed data for n users,
> >> where n would probably be between 1 and 100 users, and we will have
> >> multiple indices per search server.
> >>
> >> Running server per index is impractical, especially for a small n, so
> >> is a sinlge Solr instance capable of managing multiple searchers and
> >> writers in this way? Following on from that, does anyone know of
> >> limiting factors in Solr or Lucene that would influence our decision
> >> on the value of n - the number of users per index?
> >>
> >> Thanks!
> >> James
> >>
> >>
> >>
> >
> 
>

Re: Strategy for handling large (and growing) index: horizontal partitioning?

Reply via email to