Re: distributing indexes via solr

2006-04-27 Thread Johnny Monsod
Each indexed document will represent an email, consisting of the typical fields to/from/subject/cc/bcc/body/attachment/mailheaders where the body and attachment texts will be indexed and tokenized but not stored. It's difficult to give an estimate of the # of such documents, other than to say that

Re: distributing indexes via solr

2006-04-27 Thread Yonik Seeley
If you are after faster disks, it might just be easier to use RAID. If you want real scalability with a single-index view, you want multiple machines (which Solr doesn't support yet). If you can partition your data such that queries can be run against single partitions, then use separate Solr serv

Re: distributing indexes via solr

2006-04-27 Thread Johnny Monsod
So the thinking here was to divide the total indexed data among N partitions since the amount of data will be massive. Each partition would probably be using a separate physical disk(s), and then for searching I could use ParallelMultiSearcher to dispatch searches to each of these partitions as a

Re: distributing indexes via solr

2006-04-27 Thread Chris Hostetter
: Suppose I want the xml input submitted to solr to be distributed among a : fixed set of partitions; basically, something like round-robin among each of : them, so that each directory has a relatively equal size in terms of # of : segments. Is there an easy way to do this? I took a quick look a

Re: Solr is indexing XML only?

2006-04-27 Thread Yonik Seeley
On 4/27/06, David Trattnig <[EMAIL PROTECTED]> wrote: > thank you so much! Could you also explain me how to use these two > Tokenizers? Here's the HTMLStrip tokenizer description: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-031d5d370010955fdcc529d208395cd556f4a73e Read throug

Re: Solr is indexing XML only?

2006-04-27 Thread David Trattnig
Hi Chris, thank you so much! Could you also explain me how to use these two Tokenizers? But if there is a Tokenizer which throws away HTML markup it should be also possible to extend it and exclude additional content easily? TIA, david : will need to process that data you want to index (ie excl