Hi Wunder,
Can you please elaborate?

Vikram

On Thu, Feb 26, 2009 at 10:13 AM, Walter Underwood
<wunderw...@netflix.com>wrote:

> 1a. Multiple Solr instances partitioned by user_id%N, with index
> files segmented by user_id field.
>
> That can scale rather gracefully, though it does need reindexing
> to add a server.
>
> wunder
>
> On 2/26/09 3:44 AM, "Vikram B. Kumar" <vikrambku...@gmail.com> wrote:
>
> > Hi All,
> >
> > Our web based document management system has few thousand users and is
> > growing rapidly. Like any SaaS, while we support a lot of customers,
> > only few of them (those logged in) will be reading their index and only
> > a subset of those logged in (who are adding documents) will be writing
> > to their index.
> >
> > i.,e TU > L > U
> >
> > and TU ~ 100 x L
> >
> > where TU is total no of users, L is logged in users who are searching
> > and U is the uploaders who are updating their index.
> >
> > We have been using Lucene over a simple RESTful server for searching.
> > Indexing is currently done using regular JavaSE based setup, instead of
> > a server. We are thinking about moving to Solr to scale better and to
> > get rid of the latency associated with our non-live JavaSE based
> > indexer. We have a custom Analyzer/Filter that adds some payload to each
> > term to support our web based service.
> >
> > My message is about on how best to partition the index to support
> > multiple users.
> >
> > Hardware: The servers I have are 64 bit 1.7GHz x 2xDual Core (i.,e 4
> > cores totally) with 1/2 TB disks. By my estimate, 1/2 TB can support
> > 8000-10000 users before I need to start sharding them across multiple
> hosts.
> >
> > I have thought of the following options:
> >
> > 1. One Monilithic index, but index files segmented by user_id field.
> >
> > 2. MultiCore - One core per user.
> >
> > 3. Multiple Solr instances - Non scalable.
> >
> > 4. Don't use Solr, but enhance our Lucene +RESTful server model to
> > support indexing as well. - Least favored approach as we will be doing a
> > lot of things that Solr already does (replication, live
> > add/update/delete). Most of the things we are doing, can be done with
> > Solr's pluggable query handlers. (I guess this is not a true option at
> all).
> >
> > I am currently favouring Option 2 though want to try out whether 1 works
> > as well.
> >
> > Looks like some of the most obvious problems with MultiCores are "too
> > many open file" problems, which can be handled with hardware and
> > software boundaries (properly close index after updating and after users
> > logout).
> >
> > My questions:
> >
> > 1. Can our analyzers/filters be plugged into Solr during the time of
> > indexing?
> > 2. Does option 2 fit the above needs? Has anybody done option 2 with
> > thousands of cores in a Solr instance?
> > 3. Does option 2 to support horizontal scaling (sharding?)
> >
> > Thanks,
> > Vikram
> >
> >
>
>

Reply via email to