Thanks for the responses; I also found this useful thread from back in early
'07:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200702.mbox/browser
(the "tagging" thread), and I might also start looking at facets as an
approach.
However, I'm assuming that facet updates are still just as heavy as a normal
field update in solr?  (Haven't worked much with facets thus far).

I also saw this ticket for core Lucene using a ParallelReader, but not sure
how much traction it has:
https://issues.apache.org/jira/browse/LUCENE-1292



Understood the limitations on partitioning, and hashing to a cluster is
definitely an acceptable pattern for us, so we should be ok there.

ab


On Thu, Jul 10, 2008 at 12:06 AM, Noble Paul നോബിള്‍ नोब्ळ् <
[EMAIL PROTECTED]> wrote:

> On Thu, Jul 10, 2008 at 7:53 AM, aris buinevicius <[EMAIL PROTECTED]>
> wrote:
> > We're trying to implement a large scale domain specific web email
> > application, and so far solr performance on the search side is really
> doing
> > well for us.
> >
> > There are two limitations that I can't seem to get around however, and
> was
> > hoping for some advice.
> >
> > 1. We would like to do bulk tagging on large query result sets (ie, if
> you
> > have 1M emails, do a search, and then you wish to apply a tag to the
> result
> > set of, say, 250k results).   I've tried many approaches, but the closest
> > support I could see was the update field functionality in SOLR-139.   Is
> > there any other way to separate the very dynamic metadata (tags and other
> > fields) abstracted away from the static documents themselves?   I've
> > researched joining against a metadata database, but unfortunately the
> join
> > logic for large results is just too bulky to be perform well at scale.
> > Also have even looked at postgres tsearch2, but that also breaks down
> with a
> > large number of emails.
> Updating large no:of docs in one go is a bit expensive . (SOLR-139) is
> trying to achieve that but it is still expensive.If the users do not
> tag the docs too often then it may be OK
> >
> > 2. We're assuming we'll have thousands of users with independent data;
> any
> > good way to partition multiple indexes with solr?   With Lucene we could
> > just save those in independent directories, and cache the index while the
> > user session is active.   I saw some configurations on tomcat that would
> > allow multiple instances, but that's probably not practical for lots of
> > concurrent users.
> Maintaining multiple indices is not a good idea. Add an extra
> attribute 'userid' to each document and search with user id as a 'fq'.
> The caches in Solr will automatically take care of the rest.
> >
> > Thanks for any tips; would love to use Solr (or Lucene), but haven't been
> > able to get around issue 1 yet for large numbers of emails in a timely
> > response.   We've really looked at the gamut here, including solr,
> lucene,
> > postgres (tsearch2), sphinx, xapian, couchdb(!), and more.
> >
> > ab
> >
>
>
>
> --
> --Noble Paul
>

Reply via email to