Wouldn't it be better to do a RELOAD? http://wiki.apache.org/solr/CoreAdmin#RELOAD
Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions w: appinions.com <http://www.appinions.com/> On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge <peter.stu...@gmail.com> wrote: > The RO instance commit isn't (or shouldn't be) doing any real writing, just > an empty commit to force new searchers, autowarm/refresh caches etc. > Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in > this area. > As long as you don't have autocommit in solrconfig.xml, there wouldn't be > any commits 'behind the scenes' (we do all our commits via a local solrj > client so it can be fully managed). > The only caveat might be NRT/soft commits, but I'm not too familiar with > this in 4.0. > In any case, your RO instance must be getting updated somehow, otherwise > how would it know your write instance made any changes? > Perhaps your write instance notifies the RO instance externally from Solr? > (a perfectly valid approach, and one that would allow a 'single' lock to > work without contention) > > > > On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > > > Interesting, we are running 4.0 - and solr will refuse the start (or > > reload) the core. But from looking at the code I am not seeing it is > doing > > any writing - but I should digg more... > > > > Are you sure it needs to do writing? Because I am not calling commits, in > > fact I have deactivated *all* components that write into index, so unless > > there is something deep inside, which automatically calls the commit, it > > should never happen. > > > > roman > > > > > > On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge <peter.stu...@gmail.com> > > wrote: > > > > > Hmmm, single lock sounds dangerous. It probably works ok because you've > > > been [un]lucky. > > > For example, even with a RO instance, you still need to do a commit in > > > order to reload caches/changes from the other instance. > > > What happens if this commit gets called in the middle of the other > > > instance's commit? I've not tested this scenario, but it's very > possible > > > with a 'single' lock the results are indeterminate. > > > If the 'single' lock mechanism is making assumptions e.g. no other > > process > > > will interfere, and then one does, the Lucene index could very well get > > > corrupted. > > > > > > For the error you're seeing using 'native', we use native lockType for > > both > > > write and RO instances, and it works fine - no contention. > > > Which version of Solr are you using? Perhaps there's been a change in > > > behaviour? > > > > > > Peter > > > > > > > > > On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla <roman.ch...@gmail.com> > > wrote: > > > > > > > as i discovered, it is not good to use 'native' locktype in this > > > scenario, > > > > actually there is a note in the solrconfig.xml which says the same > > > > > > > > when a core is reloaded and solr tries to grab lock, it will fail - > > even > > > if > > > > the instance is configured to be read-only, so i am using 'single' > lock > > > for > > > > the readers and 'native' for the writer, which seems to work OK > > > > > > > > roman > > > > > > > > > > > > On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla <roman.ch...@gmail.com> > > > wrote: > > > > > > > > > I have auto commit after 40k RECs/1800secs. But I only tested with > > > manual > > > > > commit, but I don't see why it should work differently. > > > > > Roman > > > > > On 7 Jun 2013 20:52, "Tim Vaillancourt" <t...@elementspace.com> > > wrote: > > > > > > > > > >> If it makes you feel better, I also considered this approach when > I > > > was > > > > in > > > > >> the same situation with a separate indexer and searcher on one > > > Physical > > > > >> linux machine. > > > > >> > > > > >> My main concern was "re-using" the FS cache between both > instances - > > > If > > > > I > > > > >> replicated to myself there would be two independent copies of the > > > index, > > > > >> FS-cached separately. > > > > >> > > > > >> I like the suggestion of using autoCommit to reload the index. If > > I'm > > > > >> reading that right, you'd set an autoCommit on 'zero docs > changing', > > > or > > > > >> just 'every N seconds'? Did that work? > > > > >> > > > > >> Best of luck! > > > > >> > > > > >> Tim > > > > >> > > > > >> > > > > >> On 5 June 2013 10:19, Roman Chyla <roman.ch...@gmail.com> wrote: > > > > >> > > > > >> > So here it is for a record how I am solving it right now: > > > > >> > > > > > >> > Write-master is started with: -Dmontysolr.warming.enabled=false > > > > >> > -Dmontysolr.write.master=true -Dmontysolr.read.master= > > > > >> > http://localhost:5005 > > > > >> > Read-master is started with: -Dmontysolr.warming.enabled=true > > > > >> > -Dmontysolr.write.master=false > > > > >> > > > > > >> > > > > > >> > solrconfig.xml changes: > > > > >> > > > > > >> > 1. all index changing components have this bit, > > > > >> > enable="${montysolr.master:true}" - ie. > > > > >> > > > > > >> > <updateHandler class="solr.DirectUpdateHandler2" > > > > >> > enable="${montysolr.master:true}"> > > > > >> > > > > > >> > 2. for cache warming de/activation > > > > >> > > > > > >> > <listener event="newSearcher" > > > > >> > class="solr.QuerySenderListener" > > > > >> > enable="${montysolr.enable.warming:true}">... > > > > >> > > > > > >> > 3. to trigger refresh of the read-only-master (from > write-master): > > > > >> > > > > > >> > <listener event="postCommit" > > > > >> > class="solr.RunExecutableListener" > > > > >> > enable="${montysolr.master:true}"> > > > > >> > <str name="exe">curl</str> > > > > >> > <str name="dir">.</str> > > > > >> > <bool name="wait">false</bool> > > > > >> > <arr name="args"> <str>${montysolr.read.master: > > > http://localhost > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > }/solr/admin/cores?wt=json&action=RELOAD&core=collection1</str></arr> > > > > >> > </listener> > > > > >> > > > > > >> > This works, I still don't like the reload of the whole core, but > > it > > > > >> seems > > > > >> > like the easiest thing to do now. > > > > >> > > > > > >> > -- roman > > > > >> > > > > > >> > > > > > >> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla < > > roman.ch...@gmail.com > > > > > > > > >> > wrote: > > > > >> > > > > > >> > > Hi Peter, > > > > >> > > > > > > >> > > Thank you, I am glad to read that this usecase is not alien. > > > > >> > > > > > > >> > > I'd like to make the second instance (searcher) completely > > > > read-only, > > > > >> so > > > > >> > I > > > > >> > > have disabled all the components that can write. > > > > >> > > > > > > >> > > (being lazy ;)) I'll probably use > > > > >> > > http://wiki.apache.org/solr/CollectionDistribution to call > the > > > curl > > > > >> > after > > > > >> > > commit, or write some IndexReaderFactory that checks for > changes > > > > >> > > > > > > >> > > The problem with calling the 'core reload' - is that it seems > > lots > > > > of > > > > >> > work > > > > >> > > for just opening a new searcher, eeekkk...somewhere I read > that > > it > > > > is > > > > >> > cheap > > > > >> > > to reload a core, but re-opening the index searches must be > > > > definitely > > > > >> > > cheaper... > > > > >> > > > > > > >> > > roman > > > > >> > > > > > > >> > > > > > > >> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge < > > > > peter.stu...@gmail.com > > > > >> > >wrote: > > > > >> > > > > > > >> > >> Hi, > > > > >> > >> We use this very same scenario to great effect - 2 instances > > > using > > > > >> the > > > > >> > >> same > > > > >> > >> dataDir with many cores - 1 is a writer (no caching), the > other > > > is > > > > a > > > > >> > >> searcher (lots of caching). > > > > >> > >> To get the searcher to see the index changes from the writer, > > you > > > > >> need > > > > >> > the > > > > >> > >> searcher to do an empty commit - i.e. you invoke a commit > with > > 0 > > > > >> > >> documents. > > > > >> > >> This will refresh the caches (including autowarming), > [re]build > > > the > > > > >> > >> relevant searchers etc. and make any index changes visible to > > the > > > > RO > > > > >> > >> instance. > > > > >> > >> Also, make sure to use <lockType>native</lockType> in > > > > solrconfig.xml > > > > >> to > > > > >> > >> ensure the two instances don't try to commit at the same > time. > > > > >> > >> There are several ways to trigger a commit: > > > > >> > >> Call commit() periodically within your own code. > > > > >> > >> Use autoCommit in solrconfig.xml. > > > > >> > >> Use an RPC/IPC mechanism between the 2 instance processes to > > tell > > > > the > > > > >> > >> searcher the index has changed, then call commit when called > > > (more > > > > >> > complex > > > > >> > >> coding, but good if the index changes on an ad-hoc basis). > > > > >> > >> Note, doing things this way isn't really suitable for an NRT > > > > >> > environment. > > > > >> > >> > > > > >> > >> HTH, > > > > >> > >> Peter > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla < > > > > roman.ch...@gmail.com> > > > > >> > >> wrote: > > > > >> > >> > > > > >> > >> > Replication is fine, I am going to use it, but I wanted it > > for > > > > >> > instances > > > > >> > >> > *distributed* across several (physical) machines - but > here I > > > > have > > > > >> one > > > > >> > >> > physical machine, it has many cores. I want to run 2 > > instances > > > of > > > > >> solr > > > > >> > >> > because I think it has these benefits: > > > > >> > >> > > > > > >> > >> > 1) I can give less RAM to the writer (4GB), and use more > RAM > > > for > > > > >> the > > > > >> > >> > searcher (28GB) > > > > >> > >> > 2) I can deactivate warming for the writer and keep it for > > the > > > > >> > searcher > > > > >> > >> > (this considerably speeds up indexing - each time we > commit, > > > the > > > > >> > server > > > > >> > >> is > > > > >> > >> > rebuilding a citation network of 80M edges) > > > > >> > >> > 3) saving disk space and better OS caching (OS should be > able > > > to > > > > >> use > > > > >> > >> more > > > > >> > >> > RAM for the caching, which should result in faster > > operations - > > > > the > > > > >> > two > > > > >> > >> > processes are accessing the same index) > > > > >> > >> > > > > > >> > >> > Maybe I should just forget it and go with the replication, > > but > > > it > > > > >> > >> doesn't > > > > >> > >> > 'feel right' IFF it is on the same physical machine. And > > Lucene > > > > >> > >> > specifically has a method for discovering changes and > > > re-opening > > > > >> the > > > > >> > >> index > > > > >> > >> > (DirectoryReader.openIfChanged) > > > > >> > >> > > > > > >> > >> > Am I not seeing something? > > > > >> > >> > > > > > >> > >> > roman > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman < > > > > >> > >> > jhell...@innoventsolutions.com> wrote: > > > > >> > >> > > > > > >> > >> > > Roman, > > > > >> > >> > > > > > > >> > >> > > Could you be more specific as to why replication doesn't > > meet > > > > >> your > > > > >> > >> > > requirements? It was geared explicitly for this purpose, > > > > >> including > > > > >> > >> the > > > > >> > >> > > automatic discovery of changes to the data on the index > > > master. > > > > >> > >> > > > > > > >> > >> > > Jason > > > > >> > >> > > > > > > >> > >> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla < > > > roman.ch...@gmail.com > > > > > > > > > >> > >> wrote: > > > > >> > >> > > > > > > >> > >> > > > OK, so I have verified the two instances can run > > alongside, > > > > >> > sharing > > > > >> > >> the > > > > >> > >> > > > same datadir > > > > >> > >> > > > > > > > >> > >> > > > All update handlers are unaccessible in the read-only > > > master > > > > >> > >> > > > > > > > >> > >> > > > <updateHandler class="solr.DirectUpdateHandler2" > > > > >> > >> > > > enable="${solr.can.write:true}"> > > > > >> > >> > > > > > > > >> > >> > > > java -Dsolr.can.write=false ..... > > > > >> > >> > > > > > > > >> > >> > > > And I can reload the index manually: > > > > >> > >> > > > > > > > >> > >> > > > curl " > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > > > > > >> > > > > > > > > > > http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1 > > > > >> > >> > > > " > > > > >> > >> > > > > > > > >> > >> > > > But this is not an ideal solution; I'd like for the > > > read-only > > > > >> > >> server to > > > > >> > >> > > > discover index changes on its own. Any pointers? > > > > >> > >> > > > > > > > >> > >> > > > Thanks, > > > > >> > >> > > > > > > > >> > >> > > > roman > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla < > > > > >> > roman.ch...@gmail.com> > > > > >> > >> > > wrote: > > > > >> > >> > > > > > > > >> > >> > > >> Hello, > > > > >> > >> > > >> > > > > >> > >> > > >> I need your expert advice. I am thinking about running > > two > > > > >> > >> instances > > > > >> > >> > of > > > > >> > >> > > >> solr that share the same datadirectory. The *reason* > > > being: > > > > >> > >> indexing > > > > >> > >> > > >> instance is constantly building cache after every > commit > > > (we > > > > >> > have a > > > > >> > >> > big > > > > >> > >> > > >> cache) and this slows it down. But indexing doesn't > need > > > > much > > > > >> > RAM, > > > > >> > >> > only > > > > >> > >> > > the > > > > >> > >> > > >> search does (and server has lots of CPUs) > > > > >> > >> > > >> > > > > >> > >> > > >> So, it is like having two solr instances > > > > >> > >> > > >> > > > > >> > >> > > >> 1. solr-indexing-master > > > > >> > >> > > >> 2. solr-read-only-master > > > > >> > >> > > >> > > > > >> > >> > > >> In the solrconfig.xml I can disable update components, > > It > > > > >> should > > > > >> > be > > > > >> > >> > > fine. > > > > >> > >> > > >> However, I don't know how to 'trigger' index > re-opening > > on > > > > (2) > > > > >> > >> after > > > > >> > >> > the > > > > >> > >> > > >> commit happens on (1). > > > > >> > >> > > >> > > > > >> > >> > > >> Ideally, the second instance could monitor the disk > and > > > > >> re-open > > > > >> > >> disk > > > > >> > >> > > after > > > > >> > >> > > >> new files appear there. Do I have to implement custom > > > > >> > >> > > IndexReaderFactory? > > > > >> > >> > > >> Or something else? > > > > >> > >> > > >> > > > > >> > >> > > >> Please note: I know about the replication, this > usecase > > is > > > > >> IMHO > > > > >> > >> > slightly > > > > >> > >> > > >> different - in fact, write-only-master (1) is also a > > > > >> replication > > > > >> > >> > master > > > > >> > >> > > >> > > > > >> > >> > > >> Googling turned out only this > > > > >> > >> > > >> > > > > >> > >> > > > > http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912- > > > > >> > >> > > no > > > > >> > >> > > >> pointers there. > > > > >> > >> > > >> > > > > >> > >> > > >> But If I am approaching the problem wrongly, please > > don't > > > > >> > hesitate > > > > >> > >> to > > > > >> > >> > > >> 're-educate' me :) > > > > >> > >> > > >> > > > > >> > >> > > >> Thanks! > > > > >> > >> > > >> > > > > >> > >> > > >> roman > > > > >> > >> > > >> > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >