Each indexed document will represent an email, consisting of the typical fields to/from/subject/cc/bcc/body/attachment/mailheaders where the body and attachment texts will be indexed and tokenized but not stored. It's difficult to give an estimate of the # of such documents, other than to say that it would be similar to what a small to midsize corp, would generate. The system would have to cover the total amount of emails generated up to a certain date range in the past (to start out), then continuously add incremental additions on a daily basis moving forward.
-John On 4/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > If you are after faster disks, it might just be easier to use RAID. > If you want real scalability with a single-index view, you want > multiple machines (which Solr doesn't support yet). > > If you can partition your data such that queries can be run against > single partitions, then use separate Solr servers and put different > parts of the collection on each server. Then make a smart front-end > that queries the correct collection based on something in the data. > > > So the thinking here was to divide the total indexed data among N > partitions > > since the amount of data will be massive. > > How much data? (number of docs, number of indexed fields per doc, > size of all indexed fields, etc) > > -Yonik > > On 4/27/06, Johnny Monsod <[EMAIL PROTECTED]> wrote: > > So the thinking here was to divide the total indexed data among N > partitions > > since the amount of data will be massive. Each partition would probably > be > > using a separate physical disk(s), and then for searching I could use > > ParallelMultiSearcher to dispatch searches to each of these partitions > as a > > separate Searchable. I know that the Lucene doc mentioned that there is > > really not much gain in using ParallelMultiSearcher versus MultiSearcher > > (sequential of a bunch of searchables) when using it against a single > disk, > > so if we had separate physical disks, the parallel version might be of > more > > tangible benefit. > > > > -John > > > > On 4/27/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > > > > > > : Suppose I want the xml input submitted to solr to be distributed > among a > > > : fixed set of partitions; basically, something like round-robin among > > > each of > > > : them, so that each directory has a relatively equal size in terms of > # > > > of > > > : segments. Is there an easy way to do this? I took a quick look at > the > > > solr > > > > > > I'm not sure if i'm understanding your question: What would the > > > motivation be for doing something like this? ... what would the usage > be > > > like from a search perspective one you had built up these directories? > > > > > > > > > -Hoss >