As long as Solr/Lucene makes smart use from memory (and they from my experiences), it is really easy to calculate how long a huge query/update will take when you know how much the smaller ones will take. Just keep in mind that the resource consumption of memory and disk space is almost always proportional.
2008/8/19 Mike Klaas <[EMAIL PROTECTED]> > > On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: > >> >> So you experience differs from Mike's. Obviously it's an important >> decision as to whether to buy more machines. Can you (or Mike) weigh in on >> what factors led to your different take on local shards vs. shards >> distributed across machines? >> > > I do both; the only reason I have two shards on each machine is to squeeze > maximum performance out of an equipment budget. Err on the side of multiple > machines. > > At least for building the index, the number of shards really does >>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a >>> single machine starts at about 100doc/s but slows down to 10doc/s when >>> the index grows. It seems as though the limit is reached once you run >>> out of RAM and it gets slower and slower in a linear fashion the >>> larger the index you get. >>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of >>> data. >>> >> >> Can you say what the specs were for these machines? Given that I have more >> like 1TB of data over 1M docs how do you think my machine requirements might >> be affected as compared to yours? >> > > You are in a much better position to determine this than we are. See how > big an index you can put on a single machine while maintaining acceptible > performance using a typical query load. It's relatively safe to extrapolate > linearly from that. > > -Mike > -- Alexander Ramos Jardim