1a. Multiple Solr instances partitioned by user_id%N, with index
files segmented by user_id field.

That can scale rather gracefully, though it does need reindexing
to add a server.

wunder

On 2/26/09 3:44 AM, "Vikram B. Kumar" <vikrambku...@gmail.com> wrote:

> Hi All,
> 
> Our web based document management system has few thousand users and is
> growing rapidly. Like any SaaS, while we support a lot of customers,
> only few of them (those logged in) will be reading their index and only
> a subset of those logged in (who are adding documents) will be writing
> to their index.
> 
> i.,e TU > L > U
> 
> and TU ~ 100 x L
> 
> where TU is total no of users, L is logged in users who are searching
> and U is the uploaders who are updating their index.
> 
> We have been using Lucene over a simple RESTful server for searching.
> Indexing is currently done using regular JavaSE based setup, instead of
> a server. We are thinking about moving to Solr to scale better and to
> get rid of the latency associated with our non-live JavaSE based
> indexer. We have a custom Analyzer/Filter that adds some payload to each
> term to support our web based service.
> 
> My message is about on how best to partition the index to support
> multiple users.
> 
> Hardware: The servers I have are 64 bit 1.7GHz x 2xDual Core (i.,e 4
> cores totally) with 1/2 TB disks. By my estimate, 1/2 TB can support
> 8000-10000 users before I need to start sharding them across multiple hosts.
> 
> I have thought of the following options:
> 
> 1. One Monilithic index, but index files segmented by user_id field.
> 
> 2. MultiCore - One core per user.
> 
> 3. Multiple Solr instances - Non scalable.
> 
> 4. Don't use Solr, but enhance our Lucene +RESTful server model to
> support indexing as well. - Least favored approach as we will be doing a
> lot of things that Solr already does (replication, live
> add/update/delete). Most of the things we are doing, can be done with
> Solr's pluggable query handlers. (I guess this is not a true option at all).
> 
> I am currently favouring Option 2 though want to try out whether 1 works
> as well.
> 
> Looks like some of the most obvious problems with MultiCores are "too
> many open file" problems, which can be handled with hardware and
> software boundaries (properly close index after updating and after users
> logout).
> 
> My questions:
> 
> 1. Can our analyzers/filters be plugged into Solr during the time of
> indexing?
> 2. Does option 2 fit the above needs? Has anybody done option 2 with
> thousands of cores in a Solr instance?
> 3. Does option 2 to support horizontal scaling (sharding?)
> 
> Thanks,
> Vikram
> 
> 

Reply via email to