Re: Uneven shard heap usage

Michael Sokolov Mon, 02 Jun 2014 04:28:35 -0700

Joe - there shouldn't really be a problem *indexing* these fields:remember that all the terms are spread across the index, so there isreally no storage difference between one 180MB document and 180 1 MBdocuments from an indexing perspective.

Making the field "stored" is more likely to lead to a problem, althoughit's still a bit of a mystery exactly what's going on. Do they need tobe stored? For example: do you highlight the entire field? Still 180MBshouldn't necessarily lead to heap space problems, but one thing youcould play with is reducing the cache sizes on that node: if you hadvery large (in terms of numbers of documents) caches, and a lot of thedocuments were big, that could lead to heap problems. But this is alljust guessing.


-Mike


On 6/2/2014 6:13 AM, Joe Gresock wrote:

And the followup question would be.. if some of these documents are
legitimately this large (they really do have that much text), is there a
good way to still allow that to be searchable and not explode our index?
  These would be "text_en" type fields.


On Mon, Jun 2, 2014 at 6:09 AM, Joe Gresock <jgres...@gmail.com> wrote:

So, we're definitely running into some very large documents (180MB, for
example).  I haven't run the analysis on the other 2 shards yet, but this
could definitely be our problem.

Is there any conventional wisdom on a good "maximum size" for your indexed
fields?  Of course it will vary for each system, but assuming a heap of
10g, does anyone have past experience in limiting their field sizes?

Our caches are set to 128.


On Sun, Jun 1, 2014 at 8:32 AM, Joe Gresock <jgres...@gmail.com> wrote:

These are some good ideas.  The "huge document" idea could add up, since
I think the shard1 index is a little larger (32.5GB on disk instead of
31.9GB), so it is possible there's one or 2 really big ones that are
getting loaded into memory there.

Btw, I did find an article on the Solr document routing (
http://searchhub.org/2013/06/13/solr-cloud-document-routing/), so I
don't think that our ID structure is a problem in itself.  But I will
follow up on the large document idea.

I used this article (
https://support.datastax.com/entries/38367716-Solr-Configuration-Best-Practices-and-Troubleshooting-Tips)
to find the index heap and disk usage:
http://localhost:8983/solr/admin/cores?action=STATUS&memory=true

Though looking at the data index directory on disk basically said the
same thing.

I am pretty sure we're using the smart round-robining client, but I will
double check on Monday.

We have been using CollectD and graphite to monitor our VMs, as well as
jvisualvm, though we haven't tried SPM.

Thanks for all the ideas, guys.


On Sat, May 31, 2014 at 11:54 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

Hi Joe,

Are you/how are you sure all 3 shards are roughly the same size?  Can you
share what you run/see that shows you that?

Are you sure queries are evenly distributed?  Something like SPM
<http://sematext.com/spm/> should give you insight into that.

How big are your caches?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Sat, May 31, 2014 at 5:54 PM, Joe Gresock <jgres...@gmail.com> wrote:

Interesting thought about the routing.  Our document ids are in 3

parts:

<10-digit identifier>!<epoch timestamp>!<format>

e.g., 5/12345678!130000025603!TEXT

Each object has an identifier, and there may be multiple versions of

the

object, hence the timestamp.  We like to be able to pull back all of

the

versions of an object at once, hence the routing scheme.

The nature of the identifier is that a great many of them begin with a
certain number.  I'd be interested to know more about the hashing

scheme

used for the document routing.  Perhaps the first character gives it

more

weight as to which shard it lands in?

It seems strange that certain of the most highly-searched documents

would

happen to fall on this shard, but you may be onto something.   We'll

scrape

through some non-distributed queries and see what we can find.


On Sat, May 31, 2014 at 1:47 PM, Erick Erickson <

erickerick...@gmail.com>

wrote:

This is very weird.

Are you sure that all the Java versions are identical? And all the

JVM

parameters are the same? Grasping at straws here.

More grasping at straws: I'm a little suspicious that you are using
routing. You say that the indexes are about the same size, but is it

is

possible that your routing is somehow loading the problem shard

abnormally?

By that I mean somehow the documents on that shard are different, or

have a

drastically higher number of hits than the other shards?

You can fire queries at shards with &distrib=false and NOT have it

go to

other shards, perhaps if you can isolate the problem queries that

might

shed some light on the problem.


Best
er...@baffled.com


On Sat, May 31, 2014 at 8:33 AM, Joe Gresock <jgres...@gmail.com>

wrote:

It has taken as little as 2 minutes to happen the last time we

tried.

It

basically happens upon high query load (peak user hours during the

day).

  When we reduce functionality by disabling most searches, it

stabilizes.

  So it really is only on high query load.  Our ingest rate is

fairly

low.

It happens no matter how many nodes in the shard are up.


Joe


On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky <

j...@basetechnology.com>

wrote:

When you restart, how long does it take it hit the problem? And

how

much

query or update activity is happening in that time? Is there any

other

activity showing up in the log?

If you bring up only a single node in that problematic shard, do

you

still

see the problem?

-- Jack Krupansky

-----Original Message----- From: Joe Gresock
Sent: Saturday, May 31, 2014 9:34 AM
To: solr-user@lucene.apache.org
Subject: Uneven shard heap usage


Hi folks,

I'm trying to figure out why one shard of an evenly-distributed

3-shard

cluster would suddenly start running out of heap space, after 9+

months

of

stable performance.  We're using the "!" delimiter in our ids to

distribute

the documents, and indeed the disk size of our shards are very

similar

(31-32GB on disk per replica).

Our setup is:
9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio,

so

basically 2 physical CPUs), 24GB disk
3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).

We

reserve 10g heap for each solr instance.
Also 3 zookeeper VMs, which are very stable

Since the troubles started, we've been monitoring all 9 with

jvisualvm,

and

shards 2 and 3 keep a steady amount of heap space reserved,

always

having

horizontal lines (with some minor gc).  They're using 4-5GB

heap, and

when

we force gc using jvisualvm, they drop to 1GB usage.  Shard 1,

however,

quickly has a steep slope, and eventually has concurrent mode

failures

in

the gc logs, requiring us to restart the instances when they can

no

longer

do anything but gc.

We've tried ruling out physical host problems by moving all 3

Shard 1

replicas to different hosts that are underutilized, however we

still

get

the same problem.  We'll still be working on ruling out

infrastructure

issues, but I wanted to ask the questions here in case it makes

sense:

* Does it make sense that all the replicas on one shard of a

cluster

would

have heap problems, when the other shard replicas do not,

assuming a

fairly

even data distribution?
* One thing we changed recently was to make all of our fields

stored,

instead of only half of them.  This was to support atomic

updates.

Can

stored fields, even though lazily loaded, cause problems like

this?

Thanks for any input,
Joe





--
I know what it is to be in need, and I know what it is to have

plenty.

have learned the secret of being content in any and every

situation,

whether well fed or hungry, whether living in plenty or in want.

can

do

all this through him who gives me strength.    *-Philippians

4:12-13*



--
I know what it is to be in need, and I know what it is to have

plenty.

have learned the secret of being content in any and every

situation,

whether well fed or hungry, whether living in plenty or in want.

  I can

do

all this through him who gives me strength.    *-Philippians

4:12-13*



--
I know what it is to be in need, and I know what it is to have plenty.

have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I

can do

all this through him who gives me strength.    *-Philippians 4:12-13*



--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can
do all this through him who gives me strength.    *-Philippians 4:12-13*



--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can
do all this through him who gives me strength.    *-Philippians 4:12-13*

Re: Uneven shard heap usage

Reply via email to