Re: Num docs

Alexander Ramos Jardim Tue, 10 Jun 2008 12:46:09 -0700

Marcus,

2008/6/10 Marcus Herou <[EMAIL PROTECTED]>:


> Well guys you are right... Still I want to have a clue about how much each
> machine stores to predict when we need more machines (measure performance
> degradation per new document). But it's harder to collect that kind of
> data.
> It sure is doable no doubt and is a normal sharding "algo" for MySQL.
>
Sorry, but I think "performance degradation per new document" isn't a good
metric, for not saying a false one.
You measure the cost in processing, memory and io writing/reading speed that
Solr is developing and I can't see a way to get these informations based on
your document quantity.
Just figure that the same index with different usage policies and overall
architecture can have a drastic or the system performance.

>
> The best approach I think is to have some bg threads run X number of
> queries
> and collect the response times, throw away the n lowest/highest response
> times and calc an avg time which is used for in sharding and query lb'ing.
>
Sorry? Didn't get the point...

>
> Little off topic but interesting....
> What would you guys say about a good correlation between the index size on
> disk (no stored text content) and available RAM and having good response
> times.
>
I would need to benchmark a little more to answer you.

>
> How long is a rope would you perhaps say...but I think some rule of thumb
> could be established...
>
We need to establish good metrics for establishing good rules.

>
> One of the schemas of concern
> <fields>
>        <field name="feedId" type="integer" indexed="true" stored="false"
> required="true" />
>        <field name="feedItemId" type="long" indexed="true" stored="true"
> required="true" />
>        <field name="siteId" type="integer" indexed="true" stored="true"
> required="false" />
>        <field name="partnerType" type="integer" indexed="true"
> stored="false" required="true" />
>        <field name="uid" type="string" indexed="true" stored="false"
> required="true" />
>        <field name="link" type="string" indexed="true" stored="false"
> required="true" />
>        <field name="description" type="text" indexed="true" stored="false"
> required="false" />
>        <field name="title" type="text" indexed="true" stored="false"
> required="true" />
>        <field name="publishDate" type="date" indexed="true" stored="false"
> required="true" />
>        <field name="author" type="string" indexed="true" stored="false"
> required="false" />
>        <field name="keyWordId" type="integer" indexed="true" stored="false"
> required="false" multiValued="true"/>
>        <field name="category" type="integer" indexed="true" stored="false"
> required="false" />
>        <field name="language" type="integer" indexed="true" stored="false"
> required="false" />
>        <field name="country" type="integer" indexed="true" stored="false"
> required="false" />
>        <field name="ngramLang" type="integer" indexed="true" stored="false"
> required="false" />
> </fields>
>
Let me ask you something: from where do you take all these id's? database?
what about it's access times?

>
> and a normal solr query (taken from the log):
> /select
>
> start=0&q=(title:(apple)^4+OR+description:(apple))&version=2.2&rows=15&wt=xml&sort=publishDate+desc
>
>
> //Marcus
>
>
>
>
>
> On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
>
> > Exactly.  I think I mentioned this once before several months ago.  One
> can
> > take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
> > performance numbers, etc. and come up with a number for each server's
> > overall capacity.
> >
> >
> > As a matter of fact, I think this would be useful to have right in Solr,
> > primarily for use when allocating and sizing shards for Distributed
> Search.
> >  JIRA enhancement/feature issue?
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> > ----- Original Message ----
> > > From: Alexander Ramos Jardim <[EMAIL PROTECTED]>
> > > To: solr-user@lucene.apache.org
> > > Sent: Monday, June 9, 2008 6:42:17 PM
> > > Subject: Re: Num docs
> > >
> > > I even think that such a decision should be based on the overall
> machine
> > > performance at a given time, and not the index size. Unless you are
> > talking
> > > solely about HD space and not having any performance issues.
> > >
> > > 2008/6/7 Otis Gospodnetic :
> > >
> > > > Marcus,
> > > >
> > > >
> > > > For that you can rely on du, vmstat, iostat, top and such, too. :)
> > > >
> > > > Otis
> > > > --
> > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > >
> > > >
> > > > ----- Original Message ----
> > > > > From: Marcus Herou
> > > > > To: solr-user@lucene.apache.org
> > > > > Sent: Saturday, June 7, 2008 12:33:10 PM
> > > > > Subject: Re: Num docs
> > > > >
> > > > > Thanks, I wanna ask the indices how much more each shard can handle
> > > > before
> > > > > they're considered "full" and scream for a budget to get a new
> > machine :)
> > > > >
> > > > > /M
> > > > >
> > > > > On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
> > > > > wrote:
> > > > >
> > > > > > Marcus, check out the Luke request handler.  You can get it from
> > its
> > > > > > output.  It may also be possible to get *just* that number, but
> I'm
> > not
> > > > > > looking at docs/code right now to know for sure.
> > > > > >
> > > > > >  Otis
> > > > > > --
> > > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > > > >
> > > > > >
> > > > > > ----- Original Message ----
> > > > > > > From: Marcus Herou
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Sent: Saturday, June 7, 2008 5:09:20 AM
> > > > > > > Subject: Num docs
> > > > > > >
> > > > > > > Hi.
> > > > > > >
> > > > > > > Is there a way of retrieve IndexWriter.numDocs() in SOLR ?
> > > > > > >
> > > > > > > Kindly
> > > > > > >
> > > > > > > //Marcus
> > > > > > >
> > > > > > > --
> > > > > > > Marcus Herou CTO and co-founder Tailsweep AB
> > > > > > > +46702561312
> > > > > > > [EMAIL PROTECTED]
> > > > > > > http://www.tailsweep.com/
> > > > > > > http://blogg.tailsweep.com/
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Marcus Herou CTO and co-founder Tailsweep AB
> > > > > +46702561312
> > > > > [EMAIL PROTECTED]
> > > > > http://www.tailsweep.com/
> > > > > http://blogg.tailsweep.com/
> > > >
> > > >
> > >
> > >
> > > --
> > > Alexander Ramos Jardim
> >
> >
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> [EMAIL PROTECTED]
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>



-- 
Alexander Ramos Jardim

Re: Num docs

Reply via email to