Re: distributing indexes via solr

Johnny Monsod Thu, 27 Apr 2006 15:21:10 -0700

Each indexed document will represent an email, consisting of the typical
fields to/from/subject/cc/bcc/body/attachment/mailheaders where the body and
attachment texts will be indexed and tokenized but not stored.  It's
difficult to give an estimate of the # of such documents, other than to say
that it would be similar to what a small to midsize corp, would generate.
The system would have to cover the total amount of emails generated up to a
certain date range in the past (to start out), then continuously add
incremental additions on a daily basis moving forward.


-John

On 4/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> If you are after faster disks, it might just be easier to use RAID.
> If you want real scalability with a single-index view, you want
> multiple machines (which Solr doesn't support yet).
>
> If you can partition your data such that queries can be run against
> single partitions, then use separate Solr servers and put different
> parts of the collection on each server.  Then make a smart front-end
> that queries the correct collection based on something in the data.
>
> > So the thinking here was to divide the total indexed data among N
> partitions
> > since the amount of data will be massive.
>
> How much data?  (number of docs, number of indexed fields per doc,
> size of all indexed fields, etc)
>
> -Yonik
>
> On 4/27/06, Johnny Monsod <[EMAIL PROTECTED]> wrote:
> > So the thinking here was to divide the total indexed data among N
> partitions
> > since the amount of data will be massive.  Each partition would probably
> be
> > using a separate physical disk(s), and then for searching I could use
> > ParallelMultiSearcher to dispatch searches to each of these partitions
> as a
> > separate Searchable.  I know that the Lucene doc mentioned that there is
> > really not much gain in using ParallelMultiSearcher versus MultiSearcher
> > (sequential of a bunch of searchables) when using it against a single
> disk,
> > so if we had separate physical disks, the parallel version might be of
> more
> > tangible benefit.
> >
> > -John
> >
> > On 4/27/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > > : Suppose I want the xml input submitted to solr to be distributed
> among a
> > > : fixed set of partitions; basically, something like round-robin among
> > > each of
> > > : them, so that each directory has a relatively equal size in terms of
> #
> > > of
> > > : segments.  Is there an easy way to do this?  I took a quick look at
> the
> > > solr
> > >
> > > I'm not sure if i'm understanding your question:  What would the
> > > motivation be for doing something like this? ... what would the usage
> be
> > > like from a search perspective one you had built up these directories?
> > >
> > >
> > > -Hoss
>

Re: distributing indexes via solr

Reply via email to