Re: Capacity Planning Guidance

2012-07-13 Thread Erick Erickson
This question, reasonable as it appears, is just unanswerable in the abstract. About all you can do is prototype and test. Take "facet queries". The hardware requirements vary drastically based on the number of unique values in the field(s) you're faceting on, as well as whether they're multi-value

Re: capacity planning

2011-10-13 Thread Shawn Heisey
On 10/11/2011 11:49 AM, Toke Eskildsen wrote: Inline or top-posting? Long discussion, but for mailing lists I clearly prefer the former. Ditto. ;) I have little experience with VM servers for search. Although we use a lot of virtual machines, we use dedicated machines for our searchers, prim

RE: capacity planning

2011-10-13 Thread Jaeger, Jay - DOT
Message- From: eks...@googlemail.com [mailto:eks...@googlemail.com] On Behalf Of eks dev Sent: Tuesday, October 11, 2011 1:20 PM To: solr-user@lucene.apache.org Subject: Re: capacity planning Re. "I have little experience with VM servers for search." We had huge performance penalty on VMs

Re: capacity planning

2011-10-11 Thread Travis Low
Our plan for the VM is just benchmarking, not production. We will turn off all guest machines, then configure a Solr VM. Then we'll tweak memory and see what effect it has on indexing and searching. Then we'll reconfigure the number of processors used and see what that does. Then again with mor

Re: capacity planning

2011-10-11 Thread eks dev
Re. "I have little experience with VM servers for search." We had huge performance penalty on VMs, CPU was bottleneck. We couldn't freely run measurements to figure out what the problem really was (hosting was contracted by customer...), but it was something pretty scary, kind of 8-10 times slowe

Re: capacity planning

2011-10-11 Thread Otis Gospodnetic
the fields. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > >From: Erik Hatcher >To: solr-user@lucene.apache.org >Sent: Tuesday, October 11, 2011 9:49 AM >Subject: Re: ca

Re: capacity planning

2011-10-11 Thread Toke Eskildsen
Travis Low [t...@4centurion.com] wrote: > Toke, thanks. Comments embedded (hope that's okay): Inline or top-posting? Long discussion, but for mailing lists I clearly prefer the former. [Toke: Estimate characters] > Yes. We estimate each of the 23K DB records has 600 pages of text for the > co

Re: capacity planning

2011-10-11 Thread Travis Low
Toke, thanks. Comments embedded (hope that's okay): On Tue, Oct 11, 2011 at 10:52 AM, Toke Eskildsen wrote: > > Greetings. I have a paltry 23,000 database records that point to a > > voluminous 300GB worth of PDF, Word, Excel, and other documents. We are > > planning on indexing the records a

Re: capacity planning

2011-10-11 Thread Toke Eskildsen
On Tue, 2011-10-11 at 14:36 +0200, Travis Low wrote: > Greetings. I have a paltry 23,000 database records that point to a > voluminous 300GB worth of PDF, Word, Excel, and other documents. We are > planning on indexing the records and the documents they point to. I have no > clue on how we can c

Re: capacity planning

2011-10-11 Thread Travis Low
Thanks, Erik! We probably won't use highlighting. Also, documents are added but *never* deleted. Does anyone have comments about memory and CPU resources required for indexing the 300GB of documents in a "reasonable" amount of time? It's okay if the initial indexing takes hours or maybe even da

Re: capacity planning

2011-10-11 Thread Paul Libbrecht
My experience was 10% of the size. Le 11 oct. 2011 à 15:49, Erik Hatcher a écrit : > (roughly 35% the size, generally).

Re: capacity planning

2011-10-11 Thread Erik Hatcher
Travis - Whether the index is bigger than the original content depends on what you need to do with it in Solr. One of the primary deciding factors is if you need to use highlighting, which currently requires the fields to be highlighted be stored. Stored fields will take up about the same spa