Re: Delete from Solr Cloud 4.0 index..

Erick Erickson Tue, 07 May 2013 10:32:16 -0700

bq: Will docValues help with memory usage?

'm still a bit fuzzy on all the ramifications of DocValues, but I
somewhat doubt they'll result in index size savings, they _really_
help with loading the values for a field, but the end result is still
the values in memory....


People who know what they're talking about, _please_ correct this if
I'm off base.

Sure, stored field compression will help with disk space, no question.
I was mostly cautioning against extrapolating from disk size to memory
requirements without taking this into account.


Best
Erick

Best
Erick

On Tue, May 7, 2013 at 6:46 AM, Annette Newton
<annette.new...@servicetick.com> wrote:
> Hi Erick,
>
> Thanks for the tip.
>
> Will docValues help with memory usage?  It seemed a bit complicated to set
> up..
>
> The index size saving was nice because that means that potentially I could
> use smaller provisioned IOP volumes which cost less...
>
> Thanks.
>
>
> On 3 May 2013 18:27, Erick Erickson <erickerick...@gmail.com> wrote:
>
>> Anette:
>>
>> Be a little careful with the index size savings, they really don't
>> mean much for _searching_. The sotred field compression
>> significantly reduces the size on disk, but only for the stored
>> data which is only accessed when returning the top N docs. In
>> terms of how many docs you can fit on your hardware, it's pretty
>> irrelevant.
>>
>> The *.fdt and *.fdx files in your index directory contain the stored
>> data, so when looking at the effects of various options (including
>> compression), you can pretty much ignore these files.
>>
>> FWIW,
>> Erick
>>
>> On Fri, May 3, 2013 at 2:03 AM, Annette Newton
>> <annette.new...@servicetick.com> wrote:
>> > Thanks Shawn.
>> >
>> > I have played around with Soft Commits before and didn't seem to have any
>> > improvement, but with the current load testing I am doing I will give it
>> > another go.
>> >
>> > I have researched docValues and came across the fact that it would
>> increase
>> > the index size.  With the upgrade to 4.2.1 the index size has reduced by
>> > approx 33% which is pleasing and I don't really want to lose that saving.
>> >
>> > We do use the facet.enum method - which works really well, but I will
>> > verify that we are using that in every instance, we have numerous
>> > developers working on the product and maybe one or two have slipped
>> > through.
>> >
>> > Right from the first I upped the zkClientTimeout to 30 as I wanted to
>> give
>> > extra time for any network blips that we experience on AWS.  We only seem
>> > to drop communication on a full garbage collection though.
>> >
>> > I am coming to the conclusion that we need to have more shards to cope
>> with
>> > the writes, so I will play around with adding more shards and see how I
>> go.
>> >
>> >
>> > I appreciate you having a look over our setup and the advice.
>> >
>> > Thanks again.
>> >
>> > Netty.
>> >
>> >
>> > On 2 May 2013 23:17, Shawn Heisey <s...@elyograg.org> wrote:
>> >
>> >> On 5/2/2013 4:24 AM, Annette Newton wrote:
>> >> > Hi Shawn,
>> >> >
>> >> > Thanks so much for your response.  We basically are very write
>> intensive
>> >> > and write throughput is pretty essential to our product.  Reads are
>> >> > sporadic and actually is functioning really well.
>> >> >
>> >> > We write on average (at the moment) 8-12 batches of 35 documents per
>> >> > minute.  But we really will be looking to write more in the future, so
>> >> need
>> >> > to work out scaling of solr and how to cope with more volume.
>> >> >
>> >> > Schema (I have changed the names) :
>> >> >
>> >> > http://pastebin.com/x1ry7ieW
>> >> >
>> >> > Config:
>> >> >
>> >> > http://pastebin.com/pqjTCa7L
>> >>
>> >> This is very clean.  There's probably more you could remove/comment, but
>> >> generally speaking I couldn't find any glaring issues.  In particular,
>> >> you have disabled autowarming, which is a major contributor to commit
>> >> speed problems.
>> >>
>> >> The first thing I think I'd try is increasing zkClientTimeout to 30 or
>> >> 60 seconds.  You can use the startup commandline or solr.xml, I would
>> >> probably use the latter.  Here's a solr.xml fragment that uses a system
>> >> property or a 15 second default:
>> >>
>> >> <?xml version="1.0" encoding="UTF-8" ?>
>> >> <solr persistent="true" sharedLib="lib">
>> >>   <cores adminPath="/admin/cores"
>> >> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
>> >> hostContext="solr">
>> >>
>> >> General thoughts, these changes might not help this particular issue:
>> >> You've got autoCommit with openSearcher=true.  This is a hard commit.
>> >> If it were me, I would set that up with openSearcher=false and either do
>> >> explicit soft commits from my application or set up autoSoftCommit with
>> >> a shorter timeframe than autoCommit.
>> >>
>> >> This might simply be a scaling issue, where you'll need to spread the
>> >> load wider than four shards.  I know that there are financial
>> >> considerations with that, and they might not be small, so let's leave
>> >> that alone for now.
>> >>
>> >> The memory problems might be a symptom/cause of the scaling issue I just
>> >> mentioned.  You said you're using facets, which can be a real memory hog
>> >> even with only a few of them.  Have you tried facet.method=enum to see
>> >> how it performs?  You'd need to switch to it exclusively, never go with
>> >> the default of fc.  You could put that in the defaults or invariants
>> >> section of your request handler(s).
>> >>
>> >> Another way to reduce memory usage for facets is to use disk-based
>> >> docValues on version 4.2 or later for the facet fields, but this will
>> >> increase your index size, and your index is already quite large.
>> >> Depending on your index contents, the increase may be small or large.
>> >>
>> >> Something to just mention: It looks like your solrconfig.xml has
>> >> hard-coded absolute paths for dataDir and updateLog.  This is fine if
>> >> you'll only ever have one core/collection on each server, but it'll be a
>> >> disaster if you have multiples.  I could be wrong about how these get
>> >> interpreted in SolrCloud -- they might actually be relative despite
>> >> starting with a slash.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>> >
>> >
>> > --
>> >
>> > Annette Newton
>> >
>> > Database Administrator
>> >
>> > ServiceTick Ltd
>> >
>> >
>> >
>> > T:+44(0)1603 618326
>> >
>> >
>> >
>> > Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>> >
>> > www.servicetick.com
>> >
>> > *www.sessioncam.com*
>> >
>> > --
>> > *This message is confidential and is intended to be read solely by the
>> > addressee. The contents should not be disclosed to any other person or
>> > copies taken unless authorised to do so. If you are not the intended
>> > recipient, please notify the sender and permanently delete this message.
>> As
>> > Internet communications are not secure ServiceTick accepts neither legal
>> > responsibility for the contents of this message nor responsibility for
>> any
>> > change made to this message after it was forwarded by the original
>> author.*
>>
>
>
>
> --
>
> Annette Newton
>
> Database Administrator
>
> ServiceTick Ltd
>
>
>
> T:+44(0)1603 618326
>
>
>
> Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>
> www.servicetick.com
>
> *www.sessioncam.com*
>
> --
> *This message is confidential and is intended to be read solely by the
> addressee. The contents should not be disclosed to any other person or
> copies taken unless authorised to do so. If you are not the intended
> recipient, please notify the sender and permanently delete this message. As
> Internet communications are not secure ServiceTick accepts neither legal
> responsibility for the contents of this message nor responsibility for any
> change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Reply via email to