Re: Num docs

Marcus Herou Tue, 10 Jun 2008 12:30:17 -0700

Well guys you are right... Still I want to have a clue about how much each
machine stores to predict when we need more machines (measure performance
degradation per new document). But it's harder to collect that kind of data.
It sure is doable no doubt and is a normal sharding "algo" for MySQL.


The best approach I think is to have some bg threads run X number of queries
and collect the response times, throw away the n lowest/highest response
times and calc an avg time which is used for in sharding and query lb'ing.

Little off topic but interesting....
What would you guys say about a good correlation between the index size on
disk (no stored text content) and available RAM and having good response
times.

How long is a rope would you perhaps say...but I think some rule of thumb
could be established...

One of the schemas of concern
<fields>
        <field name="feedId" type="integer" indexed="true" stored="false"
required="true" />
        <field name="feedItemId" type="long" indexed="true" stored="true"
required="true" />
        <field name="siteId" type="integer" indexed="true" stored="true"
required="false" />
        <field name="partnerType" type="integer" indexed="true"
stored="false" required="true" />
        <field name="uid" type="string" indexed="true" stored="false"
required="true" />
        <field name="link" type="string" indexed="true" stored="false"
required="true" />
        <field name="description" type="text" indexed="true" stored="false"
required="false" />
        <field name="title" type="text" indexed="true" stored="false"
required="true" />
        <field name="publishDate" type="date" indexed="true" stored="false"
required="true" />
        <field name="author" type="string" indexed="true" stored="false"
required="false" />
        <field name="keyWordId" type="integer" indexed="true" stored="false"
required="false" multiValued="true"/>
        <field name="category" type="integer" indexed="true" stored="false"
required="false" />
        <field name="language" type="integer" indexed="true" stored="false"
required="false" />
        <field name="country" type="integer" indexed="true" stored="false"
required="false" />
        <field name="ngramLang" type="integer" indexed="true" stored="false"
required="false" />
</fields>

and a normal solr query (taken from the log):
/select
start=0&q=(title:(apple)^4+OR+description:(apple))&version=2.2&rows=15&wt=xml&sort=publishDate+desc


//Marcus





On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Exactly.  I think I mentioned this once before several months ago.  One can
> take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
> performance numbers, etc. and come up with a number for each server's
> overall capacity.
>
>
> As a matter of fact, I think this would be useful to have right in Solr,
> primarily for use when allocating and sizing shards for Distributed Search.
>  JIRA enhancement/feature issue?
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
> > From: Alexander Ramos Jardim <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Monday, June 9, 2008 6:42:17 PM
> > Subject: Re: Num docs
> >
> > I even think that such a decision should be based on the overall machine
> > performance at a given time, and not the index size. Unless you are
> talking
> > solely about HD space and not having any performance issues.
> >
> > 2008/6/7 Otis Gospodnetic :
> >
> > > Marcus,
> > >
> > >
> > > For that you can rely on du, vmstat, iostat, top and such, too. :)
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > > ----- Original Message ----
> > > > From: Marcus Herou
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Saturday, June 7, 2008 12:33:10 PM
> > > > Subject: Re: Num docs
> > > >
> > > > Thanks, I wanna ask the indices how much more each shard can handle
> > > before
> > > > they're considered "full" and scream for a budget to get a new
> machine :)
> > > >
> > > > /M
> > > >
> > > > On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
> > > > wrote:
> > > >
> > > > > Marcus, check out the Luke request handler.  You can get it from
> its
> > > > > output.  It may also be possible to get *just* that number, but I'm
> not
> > > > > looking at docs/code right now to know for sure.
> > > > >
> > > > >  Otis
> > > > > --
> > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > > >
> > > > >
> > > > > ----- Original Message ----
> > > > > > From: Marcus Herou
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Sent: Saturday, June 7, 2008 5:09:20 AM
> > > > > > Subject: Num docs
> > > > > >
> > > > > > Hi.
> > > > > >
> > > > > > Is there a way of retrieve IndexWriter.numDocs() in SOLR ?
> > > > > >
> > > > > > Kindly
> > > > > >
> > > > > > //Marcus
> > > > > >
> > > > > > --
> > > > > > Marcus Herou CTO and co-founder Tailsweep AB
> > > > > > +46702561312
> > > > > > [EMAIL PROTECTED]
> > > > > > http://www.tailsweep.com/
> > > > > > http://blogg.tailsweep.com/
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Marcus Herou CTO and co-founder Tailsweep AB
> > > > +46702561312
> > > > [EMAIL PROTECTED]
> > > > http://www.tailsweep.com/
> > > > http://blogg.tailsweep.com/
> > >
> > >
> >
> >
> > --
> > Alexander Ramos Jardim
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: Num docs

Reply via email to