Our plan for the VM is just benchmarking, not production. We will turn off all guest machines, then configure a Solr VM. Then we'll tweak memory and see what effect it has on indexing and searching. Then we'll reconfigure the number of processors used and see what that does. Then again with more disk space. And so on. We'll try to start with a reasonable configuration and then make intelligent guesses for our changes so we don't spend a year on this.
What we are trying to avoid is configuring a brand new box at the hoster, only to find we need a bigger and better box. Or, paying too much for something we don't need. Thanks everyone for your input, it was very helpful. cheers, Travis On Tue, Oct 11, 2011 at 2:19 PM, eks dev <eks...@yahoo.co.uk> wrote: > Re. "I have little experience with VM servers for search." > > We had huge performance penalty on VMs, CPU was bottleneck. > We couldn't freely run measurements to figure out what the problem really > was (hosting was contracted by customer...), but it was something pretty > scary, kind of 8-10 times slower than advertised dedicated equivalent. > Whatever its worth, if you can afford it, keep lucene away from it. Lucene > is highly optimized machine, and someone twiddling with context switches is > not welcome there. > > Of course, if you get IO bound, it makes no big diff anyhow. > > This is just my singular experience, might be the hosting team did not > configure it right, or something changed in meantime (~ 4 Years old > experience), but we burnt our fingers that hard I still remember it > > > > > On Tue, Oct 11, 2011 at 7:49 PM, Toke Eskildsen <t...@statsbiblioteket.dk > >wrote: > > > Travis Low [t...@4centurion.com] wrote: > > > Toke, thanks. Comments embedded (hope that's okay): > > > > Inline or top-posting? Long discussion, but for mailing lists I clearly > > prefer the former. > > > > [Toke: Estimate characters] > > > > > Yes. We estimate each of the 23K DB records has 600 pages of text for > > the > > > combined documents, 300 words per page, 5 characters per word. Which > > > coincidentally works out to about 21GB, so good guessing there. :) > > > > Heh. Lucky Guess indeed, although the factors were off. Anyway, 21GB does > > not sound scary at all. > > > > > The way it works is we have researchers modifying the DB records during > > the > > > day, and they may upload documents at that time. We estimate 50-60 > > uploads > > > throughout the day. If possible, we'd like to index them as they are > > > uploaded, but if that would negatively affect the search, then we can > > > rebuild the index nightly. > > > > > > Which is better? > > > > The analyzing part is only CPU and you're running multi-core so as long > as > > you only analyze using one thread you're safe there. That leaves us with > > I/O: Even for spinning drives, a daily load of just 60 updates of 1MB of > > extracted text each shouldn't have any real effect - with the usual > caveat > > that large merges should be avoided by either optimizing at night or > > tweaking merge policy to avoid large segments. With such a relatively > small > > index, (re)opening and warm up should be painless too. > > > > Summary: 300GB is a fair amount of data and takes some power to crunch. > > However, in the Solr/Lucene end your index size and your update rates are > > nothing to worry about. Usual caveat for advanced use and all that > applies. > > > > [Toke: i7, 8GB, 1TB spinning, 256GB SSD] > > > > > We have a very beefy VM server that we will use for benchmarking, but > > your > > > specs provide a starting point. Thanks very much for that. > > > > I have little experience with VM servers for search. Although we use a > lot > > of virtual machines, we use dedicated machines for our searchers, > primarily > > to ensure low latency for I/O. They might be fine for that too, but we > > haven't tried it yet. > > > > Glad to be of help, > > Toke > -- ** *Travis Low, Director of Development* ** <t...@4centurion.com>* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* <http://www.centurionresearch.com> **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.