bq: Will docValues help with memory usage? 'm still a bit fuzzy on all the ramifications of DocValues, but I somewhat doubt they'll result in index size savings, they _really_ help with loading the values for a field, but the end result is still the values in memory....
People who know what they're talking about, _please_ correct this if I'm off base. Sure, stored field compression will help with disk space, no question. I was mostly cautioning against extrapolating from disk size to memory requirements without taking this into account. Best Erick Best Erick On Tue, May 7, 2013 at 6:46 AM, Annette Newton <annette.new...@servicetick.com> wrote: > Hi Erick, > > Thanks for the tip. > > Will docValues help with memory usage? It seemed a bit complicated to set > up.. > > The index size saving was nice because that means that potentially I could > use smaller provisioned IOP volumes which cost less... > > Thanks. > > > On 3 May 2013 18:27, Erick Erickson <erickerick...@gmail.com> wrote: > >> Anette: >> >> Be a little careful with the index size savings, they really don't >> mean much for _searching_. The sotred field compression >> significantly reduces the size on disk, but only for the stored >> data which is only accessed when returning the top N docs. In >> terms of how many docs you can fit on your hardware, it's pretty >> irrelevant. >> >> The *.fdt and *.fdx files in your index directory contain the stored >> data, so when looking at the effects of various options (including >> compression), you can pretty much ignore these files. >> >> FWIW, >> Erick >> >> On Fri, May 3, 2013 at 2:03 AM, Annette Newton >> <annette.new...@servicetick.com> wrote: >> > Thanks Shawn. >> > >> > I have played around with Soft Commits before and didn't seem to have any >> > improvement, but with the current load testing I am doing I will give it >> > another go. >> > >> > I have researched docValues and came across the fact that it would >> increase >> > the index size. With the upgrade to 4.2.1 the index size has reduced by >> > approx 33% which is pleasing and I don't really want to lose that saving. >> > >> > We do use the facet.enum method - which works really well, but I will >> > verify that we are using that in every instance, we have numerous >> > developers working on the product and maybe one or two have slipped >> > through. >> > >> > Right from the first I upped the zkClientTimeout to 30 as I wanted to >> give >> > extra time for any network blips that we experience on AWS. We only seem >> > to drop communication on a full garbage collection though. >> > >> > I am coming to the conclusion that we need to have more shards to cope >> with >> > the writes, so I will play around with adding more shards and see how I >> go. >> > >> > >> > I appreciate you having a look over our setup and the advice. >> > >> > Thanks again. >> > >> > Netty. >> > >> > >> > On 2 May 2013 23:17, Shawn Heisey <s...@elyograg.org> wrote: >> > >> >> On 5/2/2013 4:24 AM, Annette Newton wrote: >> >> > Hi Shawn, >> >> > >> >> > Thanks so much for your response. We basically are very write >> intensive >> >> > and write throughput is pretty essential to our product. Reads are >> >> > sporadic and actually is functioning really well. >> >> > >> >> > We write on average (at the moment) 8-12 batches of 35 documents per >> >> > minute. But we really will be looking to write more in the future, so >> >> need >> >> > to work out scaling of solr and how to cope with more volume. >> >> > >> >> > Schema (I have changed the names) : >> >> > >> >> > http://pastebin.com/x1ry7ieW >> >> > >> >> > Config: >> >> > >> >> > http://pastebin.com/pqjTCa7L >> >> >> >> This is very clean. There's probably more you could remove/comment, but >> >> generally speaking I couldn't find any glaring issues. In particular, >> >> you have disabled autowarming, which is a major contributor to commit >> >> speed problems. >> >> >> >> The first thing I think I'd try is increasing zkClientTimeout to 30 or >> >> 60 seconds. You can use the startup commandline or solr.xml, I would >> >> probably use the latter. Here's a solr.xml fragment that uses a system >> >> property or a 15 second default: >> >> >> >> <?xml version="1.0" encoding="UTF-8" ?> >> >> <solr persistent="true" sharedLib="lib"> >> >> <cores adminPath="/admin/cores" >> >> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}" >> >> hostContext="solr"> >> >> >> >> General thoughts, these changes might not help this particular issue: >> >> You've got autoCommit with openSearcher=true. This is a hard commit. >> >> If it were me, I would set that up with openSearcher=false and either do >> >> explicit soft commits from my application or set up autoSoftCommit with >> >> a shorter timeframe than autoCommit. >> >> >> >> This might simply be a scaling issue, where you'll need to spread the >> >> load wider than four shards. I know that there are financial >> >> considerations with that, and they might not be small, so let's leave >> >> that alone for now. >> >> >> >> The memory problems might be a symptom/cause of the scaling issue I just >> >> mentioned. You said you're using facets, which can be a real memory hog >> >> even with only a few of them. Have you tried facet.method=enum to see >> >> how it performs? You'd need to switch to it exclusively, never go with >> >> the default of fc. You could put that in the defaults or invariants >> >> section of your request handler(s). >> >> >> >> Another way to reduce memory usage for facets is to use disk-based >> >> docValues on version 4.2 or later for the facet fields, but this will >> >> increase your index size, and your index is already quite large. >> >> Depending on your index contents, the increase may be small or large. >> >> >> >> Something to just mention: It looks like your solrconfig.xml has >> >> hard-coded absolute paths for dataDir and updateLog. This is fine if >> >> you'll only ever have one core/collection on each server, but it'll be a >> >> disaster if you have multiples. I could be wrong about how these get >> >> interpreted in SolrCloud -- they might actually be relative despite >> >> starting with a slash. >> >> >> >> Thanks, >> >> Shawn >> >> >> >> >> > >> > >> > -- >> > >> > Annette Newton >> > >> > Database Administrator >> > >> > ServiceTick Ltd >> > >> > >> > >> > T:+44(0)1603 618326 >> > >> > >> > >> > Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ >> > >> > www.servicetick.com >> > >> > *www.sessioncam.com* >> > >> > -- >> > *This message is confidential and is intended to be read solely by the >> > addressee. The contents should not be disclosed to any other person or >> > copies taken unless authorised to do so. If you are not the intended >> > recipient, please notify the sender and permanently delete this message. >> As >> > Internet communications are not secure ServiceTick accepts neither legal >> > responsibility for the contents of this message nor responsibility for >> any >> > change made to this message after it was forwarded by the original >> author.* >> > > > > -- > > Annette Newton > > Database Administrator > > ServiceTick Ltd > > > > T:+44(0)1603 618326 > > > > Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ > > www.servicetick.com > > *www.sessioncam.com* > > -- > *This message is confidential and is intended to be read solely by the > addressee. The contents should not be disclosed to any other person or > copies taken unless authorised to do so. If you are not the intended > recipient, please notify the sender and permanently delete this message. As > Internet communications are not secure ServiceTick accepts neither legal > responsibility for the contents of this message nor responsibility for any > change made to this message after it was forwarded by the original author.*