Re: SolrCloud setup - any advice?

Erick Erickson Fri, 27 Sep 2013 06:06:45 -0700

I think you're right, but you can specify a default value in your schema.xml
to at least see if this is a good path to follow.


Best,
Erick

On Fri, Sep 27, 2013 at 3:46 AM, Neil Prosser <neil.pros...@gmail.com> wrote:
> Good point. I'd seen docValues and wondered whether they might be of use in
> this situation. However, as I understand it they require a value to be set
> for all documents until Solr 4.5. Is that true or was I imagining reading
> that?
>
>
> On 25 September 2013 11:36, Erick Erickson <erickerick...@gmail.com> wrote:
>
>> Hmmmm, I confess I haven't had a chance to play with this yet,
>> but have you considered docValues for some of your fields? See:
>> http://wiki.apache.org/solr/DocValues
>>
>> And just to tantalize you:
>>
>> > Since Solr4.2 to build a forward index for a field, for purposes of
>> sorting, faceting, grouping, function queries, etc.
>>
>> > You can specify a different docValuesFormat on the fieldType
>> (docValuesFormat="Disk") to only load minimal data on the heap, keeping
>> other data structures on disk.
>>
>> Do note, though:
>> > Not a huge improvement for a static index
>>
>> this latter isn't a problem though since you don't have a static index....
>>
>> Erick
>>
>> On Tue, Sep 24, 2013 at 4:13 AM, Neil Prosser <neil.pros...@gmail.com>
>> wrote:
>> > Shawn: unfortunately the current problems are with facet.method=enum!
>> >
>> > Erick: We already round our date queries so they're the same for at least
>> > an hour so thankfully our fq entries will be reusable. However, I'll
>> take a
>> > look at reducing the cache and autowarming counts and see what the effect
>> > on hit ratios and performance are.
>> >
>> > For SolrCloud our soft commit (openSearcher=false) interval is 15 seconds
>> > and our hard commit is 15 minutes.
>> >
>> > You're right about those sorted fields having a lot of unique values.
>> They
>> > can be any number between 0 and 10,000,000 (it's sparsely populated
>> across
>> > the documents) and could appear in several variants across multiple
>> > documents. This is probably a good area for seeing what we can bend with
>> > regard to our requirements for sorting/boosting. I've just looked at two
>> > shards and they've each got upwards of 1000 terms showing in the schema
>> > browser for one (potentially out of 60) fields.
>> >
>> >
>> >
>> > On 21 September 2013 20:07, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>> >
>> >> About caches. The queryResultCache is only useful when you expect there
>> >> to be a number of _identical_ queries. Think of this cache as a map
>> where
>> >> the key is the query and the value is just a list of N document IDs
>> >> (internal)
>> >> where N is your window size. Paging is often the place where this is
>> used.
>> >> Take a look at your admin page for this cache, you can see the hit
>> rates.
>> >> But, the take-away is that this is a very small cache memory-wise,
>> varying
>> >> it is probably not a great predictor of memory usage.
>> >>
>> >> The filterCache is more intense memory wise, it's another map where the
>> >> key is the fq clause and the value is bounded by maxDoc/8. Take a
>> >> close look at this in the admin screen and see what the hit ratio is. It
>> >> may
>> >> be that you can make it much smaller and still get a lot of benefit.
>> >> _Especially_ considering it could occupy about 44G of memory.
>> >> (43,000,000 / 8) * 8192........ And the autowarm count is excessive in
>> >> most cases from what I've seen. Cutting the autowarm down to, say, 16
>> >> may not make a noticeable difference in your response time. And if
>> >> you're using NOW in your fq clauses, it's almost totally useless, see:
>> >> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>> >>
>> >> Also, read Uwe's excellent blog about MMapDirectory here:
>> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> >> for some problems with over-allocating memory to the JVM. Of course
>> >> if you're hitting OOMs, well.....
>> >>
>> >> bq: order them by one of their fields.
>> >> This is one place I'd look first. How many unique values are in each
>> field
>> >> that you sort on? This is one of the major memory consumers. You can
>> >> get a sense of this by looking at admin/schema-browser and selecting
>> >> the fields you sort on. There's a text box with the number of terms
>> >> returned,
>> >> then a / ### where ### is the total count of unique terms in the field.
>> >> NOTE:
>> >> in 4.4 this will be -1 for multiValued fields, but you shouldn't be
>> >> sorting on
>> >> those anyway. How many fields are you sorting on anyway, and of what
>> types?
>> >>
>> >> For your SolrCloud experiments, what are your soft and hard commit
>> >> intervals?
>> >> Because something is really screwy here. Your sharding moving the
>> >> number of docs down this low per shard should be fast. Back to the point
>> >> above, the only good explanation I can come up with from this remove is
>> >> that the fields you sort on have a LOT of unique values. It's possible
>> that
>> >> the total number of unique values isn't scaling with sharding. That is,
>> >> each
>> >> shard may have, say, 90% of all unique terms (number from thin air).
>> Worth
>> >> checking anyway, but a stretch.
>> >>
>> >> This is definitely unusual...
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >>
>> >> On Thu, Sep 19, 2013 at 8:20 AM, Neil Prosser <neil.pros...@gmail.com>
>> >> wrote:
>> >> > Apologies for the giant email. Hopefully it makes sense.
>> >> >
>> >> > We've been trying out SolrCloud to solve some scalability issues with
>> our
>> >> > current setup and have run into problems. I'd like to describe our
>> >> current
>> >> > setup, our queries and the sort of load we see and am hoping someone
>> >> might
>> >> > be able to spot the massive flaw in the way I've been trying to set
>> >> things
>> >> > up.
>> >> >
>> >> > We currently run Solr 4.0.0 in the old style Master/Slave
>> replication. We
>> >> > have five slaves, each running Centos with 96GB of RAM, 24 cores and
>> with
>> >> > 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs)
>> >> but
>> >> > aren't slow either. Our GC parameters aren't particularly exciting,
>> just
>> >> > -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.
>> >> >
>> >> > Our index size ranges between 144GB and 200GB (when we optimise it
>> back
>> >> > down, since we've had bad experiences with large cores). We've got
>> just
>> >> > over 37M documents some are smallish but most range between 1000-6000
>> >> > bytes. We regularly update documents so large portions of the index
>> will
>> >> be
>> >> > touched leading to a maxDocs value of around 43M.
>> >> >
>> >> > Query load ranges between 400req/s to 800req/s across the five slaves
>> >> > throughout the day, increasing and decreasing gradually over a period
>> of
>> >> > hours, rather than bursting.
>> >> >
>> >> > Most of our documents have upwards of twenty fields. We use different
>> >> > fields to store territory variant (we have around 30 territories)
>> values
>> >> > and also boost based on the values in some of these fields (integer
>> >> ones).
>> >> >
>> >> > So an average query can do a range filter by two of the territory
>> variant
>> >> > fields, filter by a non-territory variant field. Facet by a field or
>> two
>> >> > (may be territory variant). Bring back the values of 60 fields. Boost
>> >> query
>> >> > on field values of a non-territory variant field. Boost by values of
>> two
>> >> > territory-variant fields. Dismax query on up to 20 fields (with
>> boosts)
>> >> and
>> >> > phrase boost on those fields too. They're pretty big queries. We
>> don't do
>> >> > any index-time boosting. We try to keep things dynamic so we can alter
>> >> our
>> >> > boosts on-the-fly.
>> >> >
>> >> > Another common query is to list documents with a given set of IDs and
>> >> > select documents with a common reference and order them by one of
>> their
>> >> > fields.
>> >> >
>> >> > Auto-commit every 30 minutes. Replication polls every 30 minutes.
>> >> >
>> >> > Document cache:
>> >> >   * initialSize - 32768
>> >> >   * size - 32768
>> >> >
>> >> > Filter cache:
>> >> >   * autowarmCount - 128
>> >> >   * initialSize - 8192
>> >> >   * size - 8192
>> >> >
>> >> > Query result cache:
>> >> >   * autowarmCount - 128
>> >> >   * initialSize - 8192
>> >> >   * size - 8192
>> >> >
>> >> > After a replicated core has finished downloading (probably while it's
>> >> > warming) we see requests which usually take around 100ms taking over
>> 5s.
>> >> GC
>> >> > logs show concurrent mode failure.
>> >> >
>> >> > I was wondering whether anyone can help with sizing the boxes
>> required to
>> >> > split this index down into shards for use with SolrCloud and roughly
>> how
>> >> > much memory we should be assigning to the JVM. Everything I've read
>> >> > suggests that running with a 48GB heap is way too high but every
>> attempt
>> >> > I've made to reduce the cache sizes seems to wind up causing
>> >> out-of-memory
>> >> > problems. Even dropping all cache sizes by 50% and reducing the heap
>> by
>> >> 50%
>> >> > caused problems.
>> >> >
>> >> > I've already tried using SolrCloud 10 shards (around 3.7M documents
>> per
>> >> > shard, each with one replica) and kept the cache sizes low:
>> >> >
>> >> > Document cache:
>> >> >   * initialSize - 1024
>> >> >   * size - 1024
>> >> >
>> >> > Filter cache:
>> >> >   * autowarmCount - 128
>> >> >   * initialSize - 512
>> >> >   * size - 512
>> >> >
>> >> > Query result cache:
>> >> >   * autowarmCount - 32
>> >> >   * initialSize - 128
>> >> >   * size - 128
>> >> >
>> >> > Even when running on six machines in AWS with SSDs, 24GB heap (out of
>> >> 60GB
>> >> > memory) and four shards on two boxes and three on the rest I still see
>> >> > concurrent mode failure. This looks like it's causing ZooKeeper to
>> mark
>> >> the
>> >> > node as down and things begin to struggle.
>> >> >
>> >> > Is concurrent mode failure just something that will inevitably happen
>> or
>> >> is
>> >> > it avoidable by dropping the CMSInitiatingOccupancyFraction?
>> >> >
>> >> > If anyone has anything that might shove me in the right direction I'd
>> be
>> >> > very grateful. I'm wondering whether our set-up will just never work
>> and
>> >> > maybe we're expecting too much.
>> >> >
>> >> > Many thanks,
>> >> >
>> >> > Neil
>> >>
>>

Re: SolrCloud setup - any advice?

Reply via email to