Hi Wunder, Can you please elaborate? Vikram
On Thu, Feb 26, 2009 at 10:13 AM, Walter Underwood <wunderw...@netflix.com>wrote: > 1a. Multiple Solr instances partitioned by user_id%N, with index > files segmented by user_id field. > > That can scale rather gracefully, though it does need reindexing > to add a server. > > wunder > > On 2/26/09 3:44 AM, "Vikram B. Kumar" <vikrambku...@gmail.com> wrote: > > > Hi All, > > > > Our web based document management system has few thousand users and is > > growing rapidly. Like any SaaS, while we support a lot of customers, > > only few of them (those logged in) will be reading their index and only > > a subset of those logged in (who are adding documents) will be writing > > to their index. > > > > i.,e TU > L > U > > > > and TU ~ 100 x L > > > > where TU is total no of users, L is logged in users who are searching > > and U is the uploaders who are updating their index. > > > > We have been using Lucene over a simple RESTful server for searching. > > Indexing is currently done using regular JavaSE based setup, instead of > > a server. We are thinking about moving to Solr to scale better and to > > get rid of the latency associated with our non-live JavaSE based > > indexer. We have a custom Analyzer/Filter that adds some payload to each > > term to support our web based service. > > > > My message is about on how best to partition the index to support > > multiple users. > > > > Hardware: The servers I have are 64 bit 1.7GHz x 2xDual Core (i.,e 4 > > cores totally) with 1/2 TB disks. By my estimate, 1/2 TB can support > > 8000-10000 users before I need to start sharding them across multiple > hosts. > > > > I have thought of the following options: > > > > 1. One Monilithic index, but index files segmented by user_id field. > > > > 2. MultiCore - One core per user. > > > > 3. Multiple Solr instances - Non scalable. > > > > 4. Don't use Solr, but enhance our Lucene +RESTful server model to > > support indexing as well. - Least favored approach as we will be doing a > > lot of things that Solr already does (replication, live > > add/update/delete). Most of the things we are doing, can be done with > > Solr's pluggable query handlers. (I guess this is not a true option at > all). > > > > I am currently favouring Option 2 though want to try out whether 1 works > > as well. > > > > Looks like some of the most obvious problems with MultiCores are "too > > many open file" problems, which can be handled with hardware and > > software boundaries (properly close index after updating and after users > > logout). > > > > My questions: > > > > 1. Can our analyzers/filters be plugged into Solr during the time of > > indexing? > > 2. Does option 2 fit the above needs? Has anybody done option 2 with > > thousands of cores in a Solr instance? > > 3. Does option 2 to support horizontal scaling (sharding?) > > > > Thanks, > > Vikram > > > > > >