Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
You'll get very different performance profiles from the various highlighters (we saw up to 15x speed difference in our queries on average by changing highlighters). The default one re-analyzes the entire stored document, in memory and is the slowest, but provides the most faithful match to the

Re: Uneven shard heap usage

2014-06-02 Thread Joe Gresock
So, we were finally able to reproduce the heap overload behavior with a stress test of a query that highlighted the large fields we found. We'll have to play around with the highlighting settings, but for now we've disabled the highlighting on this query (which is a canned query that doesn't even

Re: Uneven shard heap usage

2014-06-02 Thread Erick Erickson
Joe: One thing to add, if you're returning that doc (or perhaps even some fields, this bit is still something of a mystery to me) then the whole 180M may be being decompressed. Since 4.1 the stored fields have been compressed to disk by default. That this, this is only true if the docs in question

Re: Uneven shard heap usage

2014-06-02 Thread Michael Sokolov
Joe - there shouldn't really be a problem *indexing* these fields: remember that all the terms are spread across the index, so there is really no storage difference between one 180MB document and 180 1 MB documents from an indexing perspective. Making the field "stored" is more likely to lead

Re: Uneven shard heap usage

2014-06-02 Thread Joe Gresock
And the followup question would be.. if some of these documents are legitimately this large (they really do have that much text), is there a good way to still allow that to be searchable and not explode our index? These would be "text_en" type fields. On Mon, Jun 2, 2014 at 6:09 AM, Joe Gresock

Re: Uneven shard heap usage

2014-06-02 Thread Joe Gresock
So, we're definitely running into some very large documents (180MB, for example). I haven't run the analysis on the other 2 shards yet, but this could definitely be our problem. Is there any conventional wisdom on a good "maximum size" for your indexed fields? Of course it will vary for each sys

Re: Uneven shard heap usage

2014-06-01 Thread Joe Gresock
These are some good ideas. The "huge document" idea could add up, since I think the shard1 index is a little larger (32.5GB on disk instead of 31.9GB), so it is possible there's one or 2 really big ones that are getting loaded into memory there. Btw, I did find an article on the Solr document rou

Re: Uneven shard heap usage

2014-05-31 Thread Otis Gospodnetic
Hi Joe, Are you/how are you sure all 3 shards are roughly the same size? Can you share what you run/see that shows you that? Are you sure queries are evenly distributed? Something like SPM should give you insight into that. How big are your caches? Otis -- Performan

Re: Uneven shard heap usage

2014-05-31 Thread Michael Sokolov
Is it possible that all your requests are routed to that single shard? I.e. you are not using the smart client that round-robins requests? I think that could cause all of the merging of results to be done on a single node. Also - is it possible you have a "bad" document in that shard? Like o

Re: Uneven shard heap usage

2014-05-31 Thread Joe Gresock
Interesting thought about the routing. Our document ids are in 3 parts: <10-digit identifier>!! e.g., 5/12345678!13025603!TEXT Each object has an identifier, and there may be multiple versions of the object, hence the timestamp. We like to be able to pull back all of the versions of an obj

Re: Uneven shard heap usage

2014-05-31 Thread Erick Erickson
This is very weird. Are you sure that all the Java versions are identical? And all the JVM parameters are the same? Grasping at straws here. More grasping at straws: I'm a little suspicious that you are using routing. You say that the indexes are about the same size, but is it is possible that yo

Re: Uneven shard heap usage

2014-05-31 Thread Joe Gresock
It has taken as little as 2 minutes to happen the last time we tried. It basically happens upon high query load (peak user hours during the day). When we reduce functionality by disabling most searches, it stabilizes. So it really is only on high query load. Our ingest rate is fairly low. It h

Re: Uneven shard heap usage

2014-05-31 Thread Jack Krupansky
When you restart, how long does it take it hit the problem? And how much query or update activity is happening in that time? Is there any other activity showing up in the log? If you bring up only a single node in that problematic shard, do you still see the problem? -- Jack Krupansky -