You could run Lucene benchmark stuff and compare. Or look at ActionGenerator from Sematext on Github which you could also use for performance testing and comparing.
Otis Solr & ElasticSearch Support http://sematext.com/ On Feb 14, 2013 10:56 AM, "Michael Della Bitta" < michael.della.bi...@appinions.com> wrote: > Or perhaps we should develop our own, Solr-based benchmark... > > Michael Della Bitta > > ------------------------------------------------ > Appinions > 18 East 41st Street, 2nd Floor > New York, NY 10017-6271 > > www.appinions.com > > Where Influence Isn’t a Game > > > On Thu, Feb 14, 2013 at 10:54 AM, Michael Della Bitta > <michael.della.bi...@appinions.com> wrote: > > My dual-core, HT-enabled Dell Latitude from last year has this CPU: > > model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz > > bogomips: 4988.65 > > > > An m3.xlarge reports: > > model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz > > bogomips : 4000.14 > > > > I tried running geekbench and phoronx-test-suite and failed at both... > > Anybody have a favorite, free, CLI benchmarking suite? > > > > Michael Della Bitta > > > > ------------------------------------------------ > > Appinions > > 18 East 41st Street, 2nd Floor > > New York, NY 10017-6271 > > > > www.appinions.com > > > > Where Influence Isn’t a Game > > > > > > On Thu, Feb 14, 2013 at 8:10 AM, Jack Krupansky <j...@basetechnology.com> > wrote: > >> That raises the question of how your average professional notebook > computer > >> (PC or Mac or Linux) compares to a garden-variety cloud server such as > an > >> Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as > document > >> ingestion rate or how many documents you can load before load and/or > query > >> performance starts to fall off the cliff. Anybody have any numbers? I > mean, > >> is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough > feel? > >> (With all the usual caveats that "it all depends" and "your mileage will > >> vary.) But the intent would be for a similar workload on both (like > loading > >> the wikipedia dump.) > >> > >> -- Jack Krupansky > >> > >> -----Original Message----- From: Erick Erickson > >> Sent: Thursday, February 14, 2013 7:31 AM > >> To: solr-user@lucene.apache.org > >> Subject: Re: What should focus be on hardware for solr servers? > >> > >> > >> One data point: I can comfortably index and search the Wikipedia dump > (11M > >> articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty > >> queries, but.... > >> > >> Erick > >> > >> > >> On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro <m...@mshapiro.net> > wrote: > >> > >>> Excellent, thank you very much for the reply! > >>> > >>> On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen < > t...@statsbiblioteket.dk > >>> >wrote: > >>> > >>> > Matthew Shapiro [m...@mshapiro.net] wrote: > >>> > > >>> > > Sorry, I should clarify our current statistics. First of all I > meant > >>> > 183k > >>> > > documents (not 183, woops). Around 100k of those are full fledged > html > >>> > > articles (not web pages but articles in our CMS with html content > >>> inside > >>> > > of them), > >>> > > >>> > If an article is around 10-30 pages (or the equivalent), this is > still a > >>> > small corpus. > >>> > > >>> > > the rest of the data are more like key/value data records with a > lot > >>> > > of attached meta data for searching. > >>> > > >>> > If the amount of unique categories (model, author, playtime, lix, > >>> > favorite_band, year...) in the meta data is in the lower hundreds, > you > >>> > should be fine. > >>> > > >>> > > Also, what I meant by search without a search term is that > probably > > >>> > > > 80% > >>> > > (hard to confirm due to the lack of stats given by the GSA) of our > >>> > searches > >>> > > are done on pure metadata clauses without any searching through the > >>> > content > >>> > > itself, > >>> > > >>> > That clarifies a lot, thanks. So we have roughly speaking 4000*5 > >>> > queries/day ~= 14 queries/minute. Guessing wildly that your peak time > >>> > traffic is about 5 times that, we end up with about 1 query/second. > That > >>> is > >>> > a very light load for the Solr installation we're discussing. > >>> > > >>> > > so for example "give me documents that have a content type of > >>> > > video, that are marked for client X, have a category of Y or Z, > and > > >>> > > > was > >>> > > published to platform A, ordered by date published". > >>> > > >>> > That is a near-trivial query and you should get a reply very fast on > >>> > modest hardware. > >>> > > >>> > > The searches that use a search term are more like use the same > query > >>> > from the > >>> > > example as before, but find me all the documents that have the > string > >>> > "My Video" > >>> > > in it's title and description. > >>> > > >>> > Unless you experiment with fuzzy matches and phrase slop, this should > >>> also > >>> > be fast. Ignoring analyzers, there is practically no difference > between > >>> > > a > >>> > meta data field and a larger content field in Solr. > >>> > > >>> > Your current search (guessing here) iterates all terms in the content > >>> > fields and take a comparatively large penalty when a large document > is > >>> > encountered. The inversion of index in Solr means that the search > terms > >>> are > >>> > looked up in a dictionary and refers to the documents they belong > to. > > >>> > The > >>> > penalty for having thousands or millions of terms as compared to > tens or > >>> > hundreds in a field in an inverted index is very small. > >>> > > >>> > We're still in "any random machine you've got available"-land so I > > >>> > second > >>> > Michael's suggestion. > >>> > > >>> > Regards, > >>> > Toke Eskildsen > >>> > >> >