Each indexed document will represent an email, consisting of the typical
fields to/from/subject/cc/bcc/body/attachment/mailheaders where the body and
attachment texts will be indexed and tokenized but not stored. It's
difficult to give an estimate of the # of such documents, other than to say
that
If you are after faster disks, it might just be easier to use RAID.
If you want real scalability with a single-index view, you want
multiple machines (which Solr doesn't support yet).
If you can partition your data such that queries can be run against
single partitions, then use separate Solr serv
So the thinking here was to divide the total indexed data among N partitions
since the amount of data will be massive. Each partition would probably be
using a separate physical disk(s), and then for searching I could use
ParallelMultiSearcher to dispatch searches to each of these partitions as a
: Suppose I want the xml input submitted to solr to be distributed among a
: fixed set of partitions; basically, something like round-robin among each of
: them, so that each directory has a relatively equal size in terms of # of
: segments. Is there an easy way to do this? I took a quick look a
On 4/27/06, David Trattnig <[EMAIL PROTECTED]> wrote:
> thank you so much! Could you also explain me how to use these two
> Tokenizers?
Here's the HTMLStrip tokenizer description:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-031d5d370010955fdcc529d208395cd556f4a73e
Read throug
Hi Chris,
thank you so much! Could you also explain me how to use these two
Tokenizers?
But if there is a Tokenizer which throws away HTML markup it should be also
possible to extend it and exclude additional content easily?
TIA,
david
: will need to process that data you want to index (ie excl