What is the best scalable scheme to support multiple users?

Vikram B. Kumar Thu, 26 Feb 2009 03:45:20 -0800

Hi All,

Our web based document management system has few thousand users and isgrowing rapidly. Like any SaaS, while we support a lot of customers,only few of them (those logged in) will be reading their index and onlya subset of those logged in (who are adding documents) will be writingto their index.


i.,e TU > L > U

and TU ~ 100 x L

where TU is total no of users, L is logged in users who are searchingand U is the uploaders who are updating their index.

We have been using Lucene over a simple RESTful server for searching.Indexing is currently done using regular JavaSE based setup, instead ofa server. We are thinking about moving to Solr to scale better and toget rid of the latency associated with our non-live JavaSE basedindexer. We have a custom Analyzer/Filter that adds some payload to eachterm to support our web based service.

My message is about on how best to partition the index to supportmultiple users.

Hardware: The servers I have are 64 bit 1.7GHz x 2xDual Core (i.,e 4cores totally) with 1/2 TB disks. By my estimate, 1/2 TB can support8000-10000 users before I need to start sharding them across multiple hosts.


I have thought of the following options:

1. One Monilithic index, but index files segmented by user_id field.

2. MultiCore - One core per user.

3. Multiple Solr instances - Non scalable.

4. Don't use Solr, but enhance our Lucene +RESTful server model tosupport indexing as well. - Least favored approach as we will be doing alot of things that Solr already does (replication, liveadd/update/delete). Most of the things we are doing, can be done withSolr's pluggable query handlers. (I guess this is not a true option at all).

I am currently favouring Option 2 though want to try out whether 1 worksas well.

Looks like some of the most obvious problems with MultiCores are "toomany open file" problems, which can be handled with hardware andsoftware boundaries (properly close index after updating and after userslogout).


My questions:

1. Can our analyzers/filters be plugged into Solr during the time ofindexing?2. Does option 2 fit the above needs? Has anybody done option 2 withthousands of cores in a Solr instance?

3. Does option 2 to support horizontal scaling (sharding?)

Thanks,
Vikram

What is the best scalable scheme to support multiple users?

Reply via email to