On Thu, 10 Jul 2008 09:36:01 +0530 "Noble Paul _____________________ __________________" <[EMAIL PROTECTED]> wrote:
> > 2. We're assuming we'll have thousands of users with independent data; any > > good way to partition multiple indexes with solr? With Lucene we could > > just save those in independent directories, and cache the index while the > > user session is active. I saw some configurations on tomcat that would > > allow multiple instances, but that's probably not practical for lots of > > concurrent users. > Maintaining multiple indices is not a good idea. Add an extra > attribute 'userid' to each document and search with user id as a 'fq'. > The caches in Solr will automatically take care of the rest. > > i have been pondering about something similar to this for some of the stuff i'm working on. Intuitively, keeping independent indices doesn't look too good. But if you split your setup (ie, 2 different clusters if needed be), having one index for the information that doesn't change often (email body , from, to, date, headers? ) + message id ( or id = concat(message_id,userid) ), then you can have a separate index for the metadata of the documents in the first index. Everytime you have updates to the mail metadata you handle it in the second index (not sure if this 2nd index would be the definite storage of metadata for mails, or it's stored in your mail app and you extract and index into SOLR afterwards). there is of course the new issue of scrubbing the 2nd index when emails are removed from your system, but i don't imagine it being terribly complex. This way, you can do away with SOLR-139 until it is stable enough + scales as needed. or altogether , not sure how well -139 will progress. wrt to the OPs question about 'how to partition the data' wrt thousands of users, you should be able to use http://wiki.apache.org/solr/DistributedSearch , or setup different clusters , each with distributed searchers setup , using the userid to decide on which cluster you'll search in ( hash(userid ) would give you an even distribution across all clusters). Thoughts? B _________________________ {Beto|Norberto|Numard} Meijome Q. How do you make God laugh? A. Tell him your plans. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.