Thanks for the responses; I also found this useful thread from back in early '07: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200702.mbox/browser (the "tagging" thread), and I might also start looking at facets as an approach. However, I'm assuming that facet updates are still just as heavy as a normal field update in solr? (Haven't worked much with facets thus far).
I also saw this ticket for core Lucene using a ParallelReader, but not sure how much traction it has: https://issues.apache.org/jira/browse/LUCENE-1292 Understood the limitations on partitioning, and hashing to a cluster is definitely an acceptable pattern for us, so we should be ok there. ab On Thu, Jul 10, 2008 at 12:06 AM, Noble Paul നോബിള് नोब्ळ् < [EMAIL PROTECTED]> wrote: > On Thu, Jul 10, 2008 at 7:53 AM, aris buinevicius <[EMAIL PROTECTED]> > wrote: > > We're trying to implement a large scale domain specific web email > > application, and so far solr performance on the search side is really > doing > > well for us. > > > > There are two limitations that I can't seem to get around however, and > was > > hoping for some advice. > > > > 1. We would like to do bulk tagging on large query result sets (ie, if > you > > have 1M emails, do a search, and then you wish to apply a tag to the > result > > set of, say, 250k results). I've tried many approaches, but the closest > > support I could see was the update field functionality in SOLR-139. Is > > there any other way to separate the very dynamic metadata (tags and other > > fields) abstracted away from the static documents themselves? I've > > researched joining against a metadata database, but unfortunately the > join > > logic for large results is just too bulky to be perform well at scale. > > Also have even looked at postgres tsearch2, but that also breaks down > with a > > large number of emails. > Updating large no:of docs in one go is a bit expensive . (SOLR-139) is > trying to achieve that but it is still expensive.If the users do not > tag the docs too often then it may be OK > > > > 2. We're assuming we'll have thousands of users with independent data; > any > > good way to partition multiple indexes with solr? With Lucene we could > > just save those in independent directories, and cache the index while the > > user session is active. I saw some configurations on tomcat that would > > allow multiple instances, but that's probably not practical for lots of > > concurrent users. > Maintaining multiple indices is not a good idea. Add an extra > attribute 'userid' to each document and search with user id as a 'fq'. > The caches in Solr will automatically take care of the rest. > > > > Thanks for any tips; would love to use Solr (or Lucene), but haven't been > > able to get around issue 1 yet for large numbers of emails in a timely > > response. We've really looked at the gamut here, including solr, > lucene, > > postgres (tsearch2), sphinx, xapian, couchdb(!), and more. > > > > ab > > > > > > -- > --Noble Paul >