Re: Uneven shard heap usage

Joe Gresock Mon, 02 Jun 2014 03:14:35 -0700

And the followup question would be.. if some of these documents are
legitimately this large (they really do have that much text), is there a
good way to still allow that to be searchable and not explode our index?
 These would be "text_en" type fields.



On Mon, Jun 2, 2014 at 6:09 AM, Joe Gresock <jgres...@gmail.com> wrote:

> So, we're definitely running into some very large documents (180MB, for
> example).  I haven't run the analysis on the other 2 shards yet, but this
> could definitely be our problem.
>
> Is there any conventional wisdom on a good "maximum size" for your indexed
> fields?  Of course it will vary for each system, but assuming a heap of
> 10g, does anyone have past experience in limiting their field sizes?
>
> Our caches are set to 128.
>
>
> On Sun, Jun 1, 2014 at 8:32 AM, Joe Gresock <jgres...@gmail.com> wrote:
>
>> These are some good ideas.  The "huge document" idea could add up, since
>> I think the shard1 index is a little larger (32.5GB on disk instead of
>> 31.9GB), so it is possible there's one or 2 really big ones that are
>> getting loaded into memory there.
>>
>> Btw, I did find an article on the Solr document routing (
>> http://searchhub.org/2013/06/13/solr-cloud-document-routing/), so I
>> don't think that our ID structure is a problem in itself.  But I will
>> follow up on the large document idea.
>>
>> I used this article (
>> https://support.datastax.com/entries/38367716-Solr-Configuration-Best-Practices-and-Troubleshooting-Tips)
>> to find the index heap and disk usage:
>> http://localhost:8983/solr/admin/cores?action=STATUS&memory=true
>>
>> Though looking at the data index directory on disk basically said the
>> same thing.
>>
>> I am pretty sure we're using the smart round-robining client, but I will
>> double check on Monday.
>>
>> We have been using CollectD and graphite to monitor our VMs, as well as
>> jvisualvm, though we haven't tried SPM.
>>
>> Thanks for all the ideas, guys.
>>
>>
>> On Sat, May 31, 2014 at 11:54 PM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com> wrote:
>>
>>> Hi Joe,
>>>
>>> Are you/how are you sure all 3 shards are roughly the same size?  Can you
>>> share what you run/see that shows you that?
>>>
>>> Are you sure queries are evenly distributed?  Something like SPM
>>> <http://sematext.com/spm/> should give you insight into that.
>>>
>>> How big are your caches?
>>>
>>> Otis
>>> --
>>> Performance Monitoring * Log Analytics * Search Analytics
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>> On Sat, May 31, 2014 at 5:54 PM, Joe Gresock <jgres...@gmail.com> wrote:
>>>
>>> > Interesting thought about the routing.  Our document ids are in 3
>>> parts:
>>> >
>>> > <10-digit identifier>!<epoch timestamp>!<format>
>>> >
>>> > e.g., 5/12345678!130000025603!TEXT
>>> >
>>> > Each object has an identifier, and there may be multiple versions of
>>> the
>>> > object, hence the timestamp.  We like to be able to pull back all of
>>> the
>>> > versions of an object at once, hence the routing scheme.
>>> >
>>> > The nature of the identifier is that a great many of them begin with a
>>> > certain number.  I'd be interested to know more about the hashing
>>> scheme
>>> > used for the document routing.  Perhaps the first character gives it
>>> more
>>> > weight as to which shard it lands in?
>>> >
>>> > It seems strange that certain of the most highly-searched documents
>>> would
>>> > happen to fall on this shard, but you may be onto something.   We'll
>>> scrape
>>> > through some non-distributed queries and see what we can find.
>>> >
>>> >
>>> > On Sat, May 31, 2014 at 1:47 PM, Erick Erickson <
>>> erickerick...@gmail.com>
>>> > wrote:
>>> >
>>> > > This is very weird.
>>> > >
>>> > > Are you sure that all the Java versions are identical? And all the
>>> JVM
>>> > > parameters are the same? Grasping at straws here.
>>> > >
>>> > > More grasping at straws: I'm a little suspicious that you are using
>>> > > routing. You say that the indexes are about the same size, but is it
>>> is
>>> > > possible that your routing is somehow loading the problem shard
>>> > abnormally?
>>> > > By that I mean somehow the documents on that shard are different, or
>>> > have a
>>> > > drastically higher number of hits than the other shards?
>>> > >
>>> > > You can fire queries at shards with &distrib=false and NOT have it
>>> go to
>>> > > other shards, perhaps if you can isolate the problem queries that
>>> might
>>> > > shed some light on the problem.
>>> > >
>>> > >
>>> > > Best
>>> > > er...@baffled.com
>>> > >
>>> > >
>>> > > On Sat, May 31, 2014 at 8:33 AM, Joe Gresock <jgres...@gmail.com>
>>> wrote:
>>> > >
>>> > > > It has taken as little as 2 minutes to happen the last time we
>>> tried.
>>> >  It
>>> > > > basically happens upon high query load (peak user hours during the
>>> > day).
>>> > > >  When we reduce functionality by disabling most searches, it
>>> > stabilizes.
>>> > > >  So it really is only on high query load.  Our ingest rate is
>>> fairly
>>> > low.
>>> > > >
>>> > > > It happens no matter how many nodes in the shard are up.
>>> > > >
>>> > > >
>>> > > > Joe
>>> > > >
>>> > > >
>>> > > > On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky <
>>> > > j...@basetechnology.com>
>>> > > > wrote:
>>> > > >
>>> > > > > When you restart, how long does it take it hit the problem? And
>>> how
>>> > > much
>>> > > > > query or update activity is happening in that time? Is there any
>>> > other
>>> > > > > activity showing up in the log?
>>> > > > >
>>> > > > > If you bring up only a single node in that problematic shard, do
>>> you
>>> > > > still
>>> > > > > see the problem?
>>> > > > >
>>> > > > > -- Jack Krupansky
>>> > > > >
>>> > > > > -----Original Message----- From: Joe Gresock
>>> > > > > Sent: Saturday, May 31, 2014 9:34 AM
>>> > > > > To: solr-user@lucene.apache.org
>>> > > > > Subject: Uneven shard heap usage
>>> > > > >
>>> > > > >
>>> > > > > Hi folks,
>>> > > > >
>>> > > > > I'm trying to figure out why one shard of an evenly-distributed
>>> > 3-shard
>>> > > > > cluster would suddenly start running out of heap space, after 9+
>>> > months
>>> > > > of
>>> > > > > stable performance.  We're using the "!" delimiter in our ids to
>>> > > > distribute
>>> > > > > the documents, and indeed the disk size of our shards are very
>>> > similar
>>> > > > > (31-32GB on disk per replica).
>>> > > > >
>>> > > > > Our setup is:
>>> > > > > 9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio,
>>> so
>>> > > > > basically 2 physical CPUs), 24GB disk
>>> > > > > 3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).
>>>  We
>>> > > > > reserve 10g heap for each solr instance.
>>> > > > > Also 3 zookeeper VMs, which are very stable
>>> > > > >
>>> > > > > Since the troubles started, we've been monitoring all 9 with
>>> > jvisualvm,
>>> > > > and
>>> > > > > shards 2 and 3 keep a steady amount of heap space reserved,
>>> always
>>> > > having
>>> > > > > horizontal lines (with some minor gc).  They're using 4-5GB
>>> heap, and
>>> > > > when
>>> > > > > we force gc using jvisualvm, they drop to 1GB usage.  Shard 1,
>>> > however,
>>> > > > > quickly has a steep slope, and eventually has concurrent mode
>>> > failures
>>> > > in
>>> > > > > the gc logs, requiring us to restart the instances when they can
>>> no
>>> > > > longer
>>> > > > > do anything but gc.
>>> > > > >
>>> > > > > We've tried ruling out physical host problems by moving all 3
>>> Shard 1
>>> > > > > replicas to different hosts that are underutilized, however we
>>> still
>>> > > get
>>> > > > > the same problem.  We'll still be working on ruling out
>>> > infrastructure
>>> > > > > issues, but I wanted to ask the questions here in case it makes
>>> > sense:
>>> > > > >
>>> > > > > * Does it make sense that all the replicas on one shard of a
>>> cluster
>>> > > > would
>>> > > > > have heap problems, when the other shard replicas do not,
>>> assuming a
>>> > > > fairly
>>> > > > > even data distribution?
>>> > > > > * One thing we changed recently was to make all of our fields
>>> stored,
>>> > > > > instead of only half of them.  This was to support atomic
>>> updates.
>>> >  Can
>>> > > > > stored fields, even though lazily loaded, cause problems like
>>> this?
>>> > > > >
>>> > > > > Thanks for any input,
>>> > > > > Joe
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > I know what it is to be in need, and I know what it is to have
>>> > plenty.
>>> > >  I
>>> > > > > have learned the secret of being content in any and every
>>> situation,
>>> > > > > whether well fed or hungry, whether living in plenty or in want.
>>>  I
>>> > can
>>> > > > do
>>> > > > > all this through him who gives me strength.    *-Philippians
>>> 4:12-13*
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > I know what it is to be in need, and I know what it is to have
>>> plenty.
>>> >  I
>>> > > > have learned the secret of being content in any and every
>>> situation,
>>> > > > whether well fed or hungry, whether living in plenty or in want.
>>>  I can
>>> > > do
>>> > > > all this through him who gives me strength.    *-Philippians
>>> 4:12-13*
>>> > > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > I know what it is to be in need, and I know what it is to have plenty.
>>>  I
>>> > have learned the secret of being content in any and every situation,
>>> > whether well fed or hungry, whether living in plenty or in want.  I
>>> can do
>>> > all this through him who gives me strength.    *-Philippians 4:12-13*
>>> >
>>>
>>
>>
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do all this through him who gives me strength.    *-Philippians 4:12-13*
>



-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Re: Uneven shard heap usage

Reply via email to