Have you considered just putting some killer queries in the firstSearcher
and newSearcher
sections of solrconfig.xml? By "killer" I mean even single queries that
search something
on all the fields you care about, facet on a bunch of fields and sort by a
bunch of
fields. These can even be all in the same query, the point is to fill up
the lower-level
caches before serving searchers.

If you're wondering about the difference, firstSearcher queries are
executed when you
boot the server, by definition there won't be anything useful for autowarm
from the
caches in solrconfig.xml.

newSearcher are queries fired after a commit happens, thus there is
probably a bunch of
autowarming data that can be used from the cache "autowarm" counts.

FWIW,
Erick


On Mon, Oct 21, 2013 at 7:34 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Michael,
>
> I agree with Shawn, don't listen to Peter ;) .... but only this once -
> he's a smart guy, as you can see in list archives.
> And I disagree with Shawn..... again, only just this once and only
> somewhat. :)  Because:
>
> In general, Shawn's advice is correct, but we have no way of knowing
> your particular details.  TO illustrate the point, let me use an
> extreme case where you have just one query that you hammer your
> servers with.  Your Solr caches will be well utilized and your servers
> will not really need a lot of memory to cache your 100 GB index
> because only a small portion of it will ever be accessed.  Of course,
> this is an extreme case and not realistic, but I think it helps one
> understands how as the number of distinct queries grows (and thus also
> the number of distinct documents being matched and returned), the need
> for more and more memory goes up.  So the question is where exactly
> your particular application falls.
>
> You mentioned stress testing.  Just like you, I am assuming, have a
> real index there, you need to have your real queries, too - real
> volume, real diversity, real rate, real complexity, real or as close
> to real everything.
>
> Since you as using SPM, you should be able to go to various graphs in
> SPM and look for a little ambulance icon above each graph.  Use that
> to assemble a message with N graphs you want us to look at and we'll
> be able to help more.  Graphs that may be of interest here are your
> Solr cache graphs, disk IO, and memory graphs -- taken during your
> realistic stress testing, of course.  You can then send that message
> directly to solr-user, assuming your SPM account email address is
> subscribed to the list.  Or you can paste it into a new email, up to
> you.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Mon, Oct 21, 2013 at 11:07 AM, Shawn Heisey <s...@elyograg.org> wrote:
> > On 10/21/2013 8:03 AM, michael.boom wrote:
> >> I'm using the m3.xlarge server with 15G RAM, but my index size is over
> 100G,
> >> so I guess putting running the above command would bite all available
> >> memory.
> >
> > With a 100GB index, I would want a minimum server memory size of 64GB,
> > and I would much prefer 128GB.  If you shard your index, then each
> > machine will require less memory, because each one will have less of the
> > index onboard.  Running a big Solr install is usually best handled on
> > bare metal, because it loves RAM, and getting a lot of memory in a
> > virtual environment is quite expensive.  It's also expensive on bare
> > metal too, but unlike Amazon, more memory doesn't increase your monthly
> > cost.
> >
> > With only 15GB total RAM and an index that big, you're probably giving
> > at least half of your RAM to Solr, leaving *very* little for the OS disk
> > cache, compared to your index size.  The ideal cache size is the same as
> > your index size, but you can almost always get away with less.
> >
> > http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
> >
> > If you try the "cat" trick with your numbers, it's going to take forever
> > every time you run it, it will kill your performance while it's
> > happening, and only the last few GB that it reads will remain in the OS
> > disk cache.  Chances are that it will be the wrong part of the index,
> too.
> >
> > You only want to cat your entire index if you have enough free RAM to
> > *FIT* your entire index.  If you *DO* have that much free memory (which
> > for you would require a total RAM size of about 128GB), then the first
> > time will take quite a while, but every time you do it after that, it
> > will happen nearly instantly, because it will not have to actually read
> > the disk at all.
> >
> > You could try only doing the cat on certain index files, but when you
> > don't have enough cache for the entire index, running queries will do a
> > better job of filling the cache intelligently.  The first bunch of
> > queries will be slow.
> >
> > Summary: You need more RAM.  Quite a bit more RAM.
> >
> > Thanks,
> > Shawn
> >
>

Reply via email to